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State~By-State Comparisons of Student Achievement: The Definition 
of the Content Domain for Assessn:ent 

Robert L. Linn 
University of Colorado, Boulder 

Twenty years ago when the National Assessment of Educational Progress 
(NAEP) was being designed, care was taken to ensure that the data would not 
allow comparisons among individual states or localities. There were a variety 
of reasons for this decision, including considerations of cost, political 
viability, and concerns about the likely misuse of state average scores on the 
assessment. Today, however, the lack of information at the level of individual 
states has been judged to be the most serious weakness of NAEP by the blue 
ribbon panel that was constituted to re'/iew NAEP and neJce recommendations about 
its future (Alexander- James, 1987 )• 

The NAEP Study Group, which was chaired by Governor Lamar Alexander and 
directed by H. Thomas James, identified the development of iitate-by-state 
comparative data as its number one priority. The Study Group reasoned that nxDSt 
''important decisions in education are made at the state or local level, and 
accountability for performance is vested at those levels" (p. 4), They also 
inplied that the decision makers at the state or local level would benefit from 
canparative information, but did not explicitly state how such information wo«-:ld 
be used to make better educational decisions. 

The Study Group considered some of the concerns that, in the past, had led 
to a decision to prevent the use of NAEP for purposes of making state-by-state 
comparisons, but concluded that the ''concerns are less important now than tney 
were previously, and that most can be readily accommodated within a redesigned 
national assessment" (Alexander-James, 1987, p. 5). Having thus dismissed the 
objections to state-to-state conparisons under the heading "previous concerns 



about comparisons", the Study Group was ready to give its most important 
recommendation . 

The single most inportant change recomnended by the Study Group is that the 
assessment collect representative data on achievement in each of the fifty 
states and the District of Columbia, Today state and local school 
administrators are encountering a rising public demand for thorcnjgh 
information on the quality of their schools, allowing comparison with data 
fraii other states and districts and with their own historical records. 
Responding to calls for greater accountability and for substantive school 
improvements, state officials have increasingly turned to the national 
assessment for assistance (pp, 11-12) • 

The movement toward state-by-state comparisons, of course, did not begin 
with the Alexander- James Study Group- Rather, the Study Group endorsed a 
position that had already garnered considerable support from policy makers 
during the past five years and pointed to the redesign of NAEP as a mechanism 
1 >r obtaining the desired comparisons. The movement toward state-by-state 
comparisons was encouraged earlier by the U.S- Department of Education and by 
the Couiicil of Chief State School Officers- 

The Council of Chief State SJiool Officers has provided considerable 
support for the idea of state-by-state comparisons during the past three years 
since the Council adopted a position paper encouraging states to develop 
comparable measures of student achievement in reading, mathematics, English, 
science, and social studies- The subsequent establishment of the State 
Education Assessment Center by the Council with the support of the Center for 
Statistics and the Mott Foundation and the activities of the Assessment Center 
and the Council since that time have given greater strength to the movement 
toward making state-by-state comparisons a reality. With support from the U.S. 
Department of Education and the National Science Foundation, the Council is now 
in proce,5s of forming a consortium of educators that will develop specific 
recommendations for the first state-by-state assessment of student achievement 
in mathematics- 
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As Ramsay Selden (1986a), the director of rhe State Education Assessment 
Center, has noted, any approach that is taken to the development of a 
system that will yield state-by-state comparisons of student achlevernent will 
raise "profound issues in educational measurement" (p. 2). Selden went on to 
discuss some of those issues and hi^. tinted the need to deal with issues of 
validity. The focus of this paper is on a limited set of issues related to the 
validity of the assessment system. More specifically, the purpose of this paper 
is to review issues concerning the definition of the domain of content to be 
covered in the assessment and the relationship of the definition and score 
reporting systems to the validity of inferences that are based on state-by-state 
comparisons. 
Validity 

As with any use of test 3, the most fundamental measurement issue in the 
development of an assessment system that will provide state-by-state comparisons 
is the validity of the inferences that will be made from the scores. To date, 
however, relatively lit^ie serious attention las been given to the questions of 
validity of a NAEP based state-by-state coniparirvon s:/stem, or for that natter, 
any other system other than the seriously flawed use of ACT and SAT scores as 
indicators of the educational quality in a state. 

Although not couched in terms of validity, the primary concern that was 
raised in the National Academy of Education's reviea' cormittee commentary on the 
Alexander- James report is fundamentally an issue of validity. The Review 
Caimittee (National Academy of Education, 1987, p. 59) suimarized its 
reservations about the recommendation that NAEP be redesigned to provide state- 
by-state comparisons as follows. 

We are concerned about the enphasis in the Alexander-Jaines report on state- 
by-state comparisons of average test scores. Many factors influence the 
relative rankings of states, districts, and schools. Simple comparisons 
are ripe for abuse and are unlikely to inform meaningful school improvement 
efforts. 



As is clearly implied by the above statement, the Review Cofmit tee's 
concern applies not only to the proposed state-by-state conparisons using NAET> 
but to the use of average test scores for other units such as individual school 
buildings or school districts. The concern is not limited to the use of NAEP, 
It would apply equally well to the use of other assessment devices or tests. 
Tiie concern is clearly with the inferences that the Review Ccnmittee anticipated 
would be made from the test data and the validity of those inferences will 
depend on a wide variety of factors, such as the degree of standardization of 
the rules for inclusion and exclusion of students in the assessment, the 
t;pecilic sampling procedures, and the administration procedures. One of the 
important factors that will influence the validity of the inferences drawn from 
the comparisons, however, is the adequacy of the content coverage of the 
assessment. 
Content Domain 

It is one thing to agree that the assessment should cover the ''core content 
areas (reading, writing, and literacy; mathematics, science, and technology; 
history, geography, and civics)" (Alexander-James, 1987, p. 12), but quite 
another to agree that a particular set of topics in, say, history, much less 
that a specific set of items, should be included on the assessment that is to x 
used to conpare states. It is also much easier to achieve agreement that ''the 
assessment instruments should examine acquisition of pertinent 'higher-order' 
skills as well as basic skills, knowledge, and concepts" (Alexander-Jaines, 1987, 
p. 8), than it is to gain consensus that a given exercise is a fair asses55ment 
of higlier-order thinking skills. Many of the issues that arise when a school or 
district selects a test are also relevant at the state level. Among these are 
the issues of the breadth of the coverage, the match between what is taug^it and 
what is tested, the number and specificity of the scores that are reported, and 
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the familiarity of the assessment procedures that are used. 

Breadth of Coverage and the Match with What is Tau^t . Since the issues of 
breadth of coverage and that of the degree to which the assessment matches the 
curriculum and what is actually tau^t in classroom are closely related, they 
will be considered together. One approach to the determination of the content 
to be included in an assessment would be to require a consensus among all states 
that a given topic or assessment exercise is appropriate to the state's 
instructional goals for students at a given point in xheir educational program. 
As Selden (1986a) has noted, the consensus about a "comnr>n body of knowledge 
could be conceived as a • least ccramDn set' — that content which is pursued to 
some degree by schools in each [state], but excluding anything which all states 
cannot be presumed to be teaching or enphasizing. Alternatively, it could be 
conceived as an • optimal set', around which consensus can be reached, but which 
nay not reflect everything some states are pursuing, and which may include some 
items that some states may not be pursuing or enphasizing" (p. 7). To these two 
alternatives could be added, at least in theory, an "inclusive set", that 
content that is judged to be appropriate by one or more states. 

Although the "inclusive set" is apt to be too unwieldy in practice, it 
illustrates an end of a continuum that is anchored at the other end by the 
"least camon set". On the surface the least common set appears the fairest 
approach. It would not hold a state accountable for students learning content 
that was not expecteKl to be taught in its schools by a given grade level. 
However, as will be discussed in some detail below, the least common set 
approach can be faulted on several accounts including that of fairness. 

The issue of where along the continuum between the least common and the 
inclusive sets an assessment should be placed is no+ unique to the present 
context. It long has been an important issue in the use of tests in program and 
curriculum evaluation (e.g., Burstein, 1981; C. onbach, 1963; Walker & 



Schaffarick, 1974; Wargo & Green, 1978). If a test does not measure the 
outcomes that correspond to important program goals, the evaluation will surely 
be considered unfair. The judgnent that thf evaluation is unfair takes on 
additional force when multiple prograxns are compared and the tests used to 
measure the educational outcomes of the programs appear to natch the goals of 
one program better than another. 

The latter point is clearly illustrated by the controversy that surrounded 
the Follow Through evaluation. Follow Through was a massive federal experiment 
that pitted twenty-two early education models against each other over the course 
of ten years. The raDdel progra^ns varied considerably in their stated goals but 
were evaluated using a conron set of outcome measures. Between-model 
differences were found on some of the subtests of the Metropolitan Achievement 
Test (MAT) (Stebbins, St. Pierre, Proper, Anderson, t Cerva, 1977). The 
differences occurred on subtests that the evaluators classified as "basic 
skills" and favored models that were classified as emphasizing basic skills over 
models that were classified as having a "cognitive-conceptual" emphasis or an 
"affective-cognitive" emphasis. Press accounts of the evaluation presented the 
message that education that emphasizes the basics yields the best results. 

Because of the, tDotential inportance of the Follow Through evaluation, the 
Ford Foundation sponsored a comprehensive third-party review of the evaluation. 
The review resulted in a devastating critique that faulted the evaluation on 
nujnerous grounds (House, Glass, McLean, & Walker, 1978). Of most relevance to 
the present discussion, however, is the House, et al. critique of the 
measurement of the program outcomes and the characterization of those outcomes. 
Their analysis led them to conclude that "the outcome muasures assess very few 
of the models* goals j.nd strongly favor irodels that concentrate on teaching 
mechanical skills" (House, et al, 1978, p. 156). 



Althou^ not strictly a question of test content, the format of the test 
items and administrative procedures can also have implications for the results 
of an assessment. Even apparently trivial changes in item format, such as the 
presentation of addition problems horizontally rather than vertically, have been 
found to effect the scores that children obtain (Alderman, Swinton, & Braswell, 
1979). More importantly, the outcome of an assessment can be affected by the 
match between the format ''sed to ask question on the test and the format used 
^hen students practice the skill in the instructional program and the amount of 
practice that they have with similar tests (Alderman, et al., 1979; Ctooley St 
Leinhardt, 1980; Roberts, 1980). 

The match between what is tau^t and what is tested can ha.ve a substantial 
effect on the performance on tests. The closer the match and the nt>ra the test 
questions tap rote memDry, the larger the likely effects. Indeed, two of the 
most compelling exanples involve the choice of words for tests of spelling or 
for the vocabulary used to assess beginning reading. Hopkins and Wilkerson 
(1965) compared four fomns of the California Spelling Test to the course of 
study guide used in California. Because the fonns varied in the degree to which 
they matched the study guide, knowledge of only thoee words that were in the 
curriculum study guide would yield scores that differed by as much as 2.1 ^rade 
equivalent units depending on which of the four forms was used. As would be 
expected, the California J5tudents were much more likely to correctly respond to 
words that were in the curriculum than words that were not. 

Bianchini's (1978) analysis of the remarkable increase in the percentile 
rank of the median reading achievement test score for first grade students 
between 1970 and 1971 provides another example of the dramat c effect that the 
degree of match between what is taught and what is tested can have on tests 
scores. Over the course of that single year, the median score for first grade 
students throughout the state rose from the 38th to the 50th percentile. As 
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Bianchini's analyses suggests, however, the huge increase had more to do with 
the fact that the test that \vas used to measure reading achieveriKnt was 
different in 1971 than it was in 1970, than to any dranatic increase in the 
quality of education provided to first grade diildren. Bianchini found that 55^ 
of the vocabulary on the test that was used in 1971 was included in the state's 
first grade readers, whereas only 19% of the vocabulary on the test used the 
previous year was included in the readers. 

Results such as those reported by Bianchini (1978), Hopkins and Wilkerson 
(1965), and others (e.g., Cooley StLeinhardt, 1980; Leinhardt, 1983; Leinhardt & 
Seewald, 198J.) a.^t lead one to believe that ''least common*' set approach is 
necessary to avoid unfair comparisons. However, the solution is not that 
sinple. To begin with, the fact that two programs both teach children to add 
two digit integers, for example, does not imply that both programs give that 
skill the same priority or spend an equal amount of time teaching it. If the 
children at one school were drilled extensively on the addition of two digit 
integers, with little attention given to other arithmetic operations or to 
mathematics concepts, while children at a second school spent some, but much 
less, time on that skill while spending considerably more on other skills and on 
concepts and problem solving, a test that only measured the addition of two 
digit numbers would hardly be considered fair. As in the case of the Follow 
Through evaluation, the test would strongly favor the first school because it 
lacked more comprehensive coverage of the skills and concepts that were 
emphasized at the second school. While such extremes are unlikely to be 
encountered in practice, even at the level of individual schools raucn less at 
the level of entire states, the exr.nple illustrates the fact that the use of the 
least common set will tend to favor those who eiiphasize the skills and concepts 
contained in that set at the expense of those that are not included in the set. 
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No matter what process is used to define th3 domain of content, it must 
include knowledge, skills, and concepts that educators, policy makers, and the 
general public consider important. This is part of the reason that the 
Alexa. . J^imes (1987) report emphasized that the assessment should include 
measures of higher order skills which the report defined to include "recognizing 
a problem's general structure, defining goals, isolating the information 
relevant to problem solutions, ... evaluating the msrits of arguments, ... 
reasoning, analyzing, explaining, and finding analogies" (p. 15). Such a list 
does not appear to be compatible with the least common set approach to defining 
the domain to be assessed for purposes of state-by-state cotrparisons. 

The Alexander-James (1987) list of higher-order thinking s':ills would push 
the assessment beyond a minimum set of basic skills that would be likely to 
define the least corrmon set to a broader set of goals. In as much as there is 
general agreement that higher-order thinking skills of the type envisioned by 
the Alexander-James study group should be taught, the list is in keeping with 
what Selden (1986b) has referred to as the "optimal consensus" approach wherein 
the content of the assessment would be defined to include content for which a 
consensus can be reached that given content knowledge and skills shou Id be 
tau^t. The idea of this approach is that it would allo*^ the assessment to go 
beyond minimal objectives that are already pursued by all and thereby have a 
aotentially broadening influence on thti curriculum rather than a narrowing 
influence that is apt to be associated with least-common-set approach. 

If the assessment is to encourage greater breadth and depth of content 
coverage, it will need to have a content domain with broadly defined liraits and 
emphasize more than simple factual knowledge. As Anderson (1986) has note, such 
an assessment is apt to measure several dimensions of achievement within each 
subject area and raise questions about the nature and number of scores to be 
reported. 



Number and Specificity of Scores . Cronbach (e.g., 1963, 1971) has long 
argued that for purposes of evaluation, a comprehensive array of measures should 
be sougjit. *'An id^al evaluation might include measures of all the types of 
proficiency that might reasonably be desired in the area in question, not just 
the selected outcoraes to which ... (a particular] curriculum directs substantial 
attention" (Cronbach, 1963, p. 680). The assessment needs to provide a basis 
for identifying areas that are judged to be LTiportant but that students are net 
learning, whether or not the poor learning is the result of lack of exposure. 
Furthenrore, for purposes of making decisions about the curriculum or program of 
instruction, the test results need to be reported separately for each of the 
specific areas of proficiency, and not merely combined into a single overall 
score. 

The latter point runs counter to the goal of having a simple scorecard that 

will allow the ranking of states along a single dimersion. However, Cronbach 's 

rationale for maintaining separate scores is compel lii^j^. 

If the original test or battery is a composite covering various types of 
content or various objectives, it implicitly weights those elements, either 
by the number of items allocated to each or by the way the score is 
calculated. Such a weighting cannot satisfy decision makers who hold 
values unlike those of the tc^*. developer. Consequently, an ideally 
suitable battery for evaluation purposes will include separate measures of 
all outcomes the users of the information consider important . . . Reporting 
separate scores allows for the application of various systems of values. 
It also enables the investigator to examine the nature of any weaknesses in 
the program, (emphasis in the original) (Cronbach, 1971, p. 460). 

The use of a single composite score not only forces an implicit set of 

values on the outcome of the assessmei. . and prevents those who hold differen": 

values from seeing the results from that alternate framework, but th^* composite 

may sometimes be insensitive to differences between the educational systems that 

are being compared (Airasian & Madaus, 1983; Madaus, Airasian, & Kellaghan. 

1980)* In other instances, and of even greater concern, the composite may favor 

a system with an emphasis that happens to match the content that the composite 
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wei^ts most heavily. 

The latter problem is illustrated by the results of Walker and 
Schaffarzick's (1974) review of twenty-six studies that compared students who 
had been exposed to a given subject matter using either ''traditional" or 
"innovative" curriculum naterials and then tested with one or more measures of 
achievement. Their review provides strong evidence that " different curricula 
are associated with different patterns of achievement " (efr?)hasis in the 
original) (p. 97). Whether the results of the studies reviewed favored the 
"traditional" or the "innovative" curriculum was largely determined by the 
content cf the tests. "Students using each curriculum do better than their 
fellow students on tests which include items not covered at all in the other 
curriculum or given less emphasis there" (Walker & Schaffarzick, 1974, p. 97). 
If a single global score were used to compare the alternative curricula an 
outcome of no difference, one favoring the traditional curriculum, or one 
favoring the innovative curriculum could be readily achieved according to tlie 
relative weighting given to the test content favoring each. 

The need to report multiple scores corresponding to narrowly defined 
content areas is clearly demonstrated by recenl* experience with tests that are 
customized to the specifications of a state or local district. The need for 
multiple scores can also be demonstrated from recent experience with the NAEP 
assessments in literature and U. S. history. In both instances it is evident 
that a single total score can conceal specific areas of strength and weakness. 
Furthermore, the relative standing of a given state, region, or other aggregate 
of students can be greatly influenced by the number of itans that happen to be 
associated with specific content areas. 

In the past, if a state or district wanted to compare the achievrr^nt of 
its students to a national norm, it had to administer a norm-referenced test. 
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If the state or district also wanted to obtain test results on a test designed 
to match locally defined objectives, a second test administration was generally 
required since the standardized test would not match the locally defined 
objectives as closely as desired. Recently, however, test publishers have begun 
offering an option of creating a "customized" test that consists of items 
selected according to locally specified objectives, but from which norm- 
referenced scores are also produced. 

Customizec tests are the result of increased use of item response theory by 
publishers in their test development and scaling process. One of the features 
of item response theory that makes it especially appealing is the promise that, 
once the theory has been used to calibrate a pool of test items, any set of 
items from that pool can be used to place the performance of test takers on a 
common scale. Thus, according to the theory, any set of previously calibrated 
items could be selectea by a state or district to be included among those on its 
customized test and the resulting test scores could still be placed on the same 
scale as the published version of the standardized test for which national norms 
are available. 

The quality of the norm-referenced scores that a state or district obtairs 
for its customized test depends on several factors, including (1) the adequacy 
of tne item res:ponse theory model for the set of items in the calibrated item 
pool, (2) the nui*:ber of calibrated items selected for the customized test, (3) 
the statistical characteristics of the items selected from the item oool, and 
(4) the degree to which items selected for the customized test match the content 
coverage of the published version of the test for which the norms are available. 
Recent experience with a major customized test, the Kentucky Essential Skills 
Test (KEST), suggests that the last of these four considerations can te of 
critical importance (Linn, 1986; Yen, Green, & Burkett, 1987). 

Kentucky administered the KEST to essentially all eligible students in the 



state in grades K through 12 for the first tune in 1985. The 1985 KEST was a 
customized test, containing, among other items, items that were selected fron 
the CTB/McGraw-Hill item pool. That pool includes items from the Comprehensive 
Tests of Basic Skills (CTBS), Forms U and V, items from the California 
Achievement Tests, Forms C and D, and previously unpublished items. Since all 
iteiK are calibrated to the CTBS scale, a test that had previously been 
administered statewide in Kentucky, it was possible to obtain estimates of 
performance on the CTBS scale from the administration of the KEST. When the 
KEST results were obtained in 1985, however, at least two major anomalies were 
observed. The the most notable and troublesome of these was a precipitous 
increase in the grade 5 mathematics test performance. 

In 1982, 1983, and 1984, when the CTBS was administered statewide to fifth 
grade students, the state mean normal curve equivalent (NCE) scores in 
mathermtics n^nged from 50.4 to 54.8. In 1985, however, the me:.n rCE for grade 
5 mathematics based on the KEST was 66.3. Thus, on the NCE scale, which has a 
standard deviation of 21 for the national norm group, the state mean increased 
in a single year by slightly over a half of the national norm group standard 
deviation. Although a review of the KEST and the calibration of the items in 
the item pool from which it was constructed did not suggest that the application 
of item response theory was any more problematic than in many other widely 
accepted applications, it was evident that the grade 5 mathematics -esults on 
the 1985 KEST could not be meaningfully corrpared to the earlier CTBS results 
(Linn, 1986). 

The lack of comparability between the KEST and CTBS grade 5 mathematics 
tests is most plausibly explained by differences in the proportion of items 
on the KEST and the CTBS that are classified into specific content categories. 
The proportions of KEST and CTBS grade 5 mathematics items by content category- 

O 1 -7 



were as follows (Linn, 1986). 



Content 
Category 



CTBS 
F^roportion 



KEST 
Proportion 



Numeration 
Number Theory 
Measurement 
Geonetry 

Number Sentences 
PrcDlem Solving 



.42 
.03 
.16 
.10 
.19 
.10 



.27 
.13 
.11 
.20 
.20 
.09 



As was derronstrated by Yen, Green, and Burkett (1987), systematic 
differences as a function of content category between local and national 
estimates .'^f item response theory difficulty parameters are sometimes found. 
Such differences can lead to misleading global score results when content 
coverage changes. "Content equivalence between customized and normed tests is 
essential if the customized test is to be NRT-equivalent and norm-valid" (Yen, 
Green, & Burkett, 1987, p. 13). Separate reporting by specific content 
categories, however, is needed in order to identify areas of strong and weak 
performance and to make value judgments about the importance of changes in 
scoi-es on the global score. 

The final example illustrating the importance of multiple scores 
corresponding to specific content categories comes from the recent XAEP results 
in literature and U. ?. history (Applebee, Langer, & Mullis, 1987^. Both the 
literature and the U. S. history item sets met the usual criteria for deciding 
if a unidimensional item response theory model is appropriate. Hence, single 
global performance scores were estimated for each of the two broad content 
domains. 

Despite the apparent simplicity for each content area, however, substantial 
diff'^.Teaces that could be meaningfully interpreted were found for content 
siDTcific subsets of items as a function of region of the country, ^nder, and 
race/ethnicity. For example, even though the performb^nce of black test takers 
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was well below that of whites on the bulk of the literature and U. S. history 
items, blacks outperformed whites on questions asking about black leaders or 
black literature. Black test takers also did better than whites on several of 
the questions dealing with slavery aad civil rights. Similarly, though women 
outperfonned men on the overall literature scale, men did better on "items 
focusing on strong mle literary characters" (Applebee, et al., 1987. p. 3). 
such as Robin Hood, King Arthur, Samson, and Captain /hab. Although the 
Southeast region of the country scored well below the northeast on the overall 
literature scale, the converse was true on the 15 items dealing with Biblical 

characters and stories. 

The above examples illustrate two points that are of g at potential 
importance in any future state-by-state comparisons of student achievement. 
First, the ra--: order on a single global score is apt to depend on the 
particular weighting of the content categories. Based on the KEST results, one 
might reasonably expect, for example, that Kentucky would have appeared better 
on a grade 5 test with heavy emphasis on numeration than on one that emphasized 
another content catpcory such as number theory or geometry. Second, a single 
global score can also conceal educationally important information about 
strengths and weaknesses in the curriculiom. 

•me need to focus on multiple content specific outcomes has been recognized 
within the context of state assessments by Bock and his colleagues (Bock & 
Mislevy. 1987; Bock, Mislevy. & Woodson, 1982; Mislevy. 1983). For purposes of 
informing curriculum planners, assessment information needs to be provided for 
highly specific content areas which Bock, Mislevy. and Woodson (1982) called 
"indivisible curricular elements". These are "item domains that are 
sufficiently hon«geneous with respect to content that all the items in a given 
domain would be similarly affected by changes in curricular emphasis" (Mislevy' . 
1983, p. 273). 

) J 
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Summary and Conclusions 

It has been argued that the choice of content for a state-by-stc.te will be 
one of many factors that will have a substantial influence on the validity of 
inferences that may be drawn from a state-by--state assessment system. Based on 
considerable experience in the use of tests in the evaluation of alternative 
educational programs, it was concluded that there are great disadvantages to an 
approach that focuses only on content and skills that are thought to be tau^t 
by a given grade in all states. Such a "least-comnon set'* approach would be 
likely to give a relative advantage to states that narrow their focus to only 
that least cannon set. The approach is more likely to narrow than to broaden 
the curriculum. 

Ideally, the domain for assessment would include separate measures of the 
full range of outcomes that are considered inportant by any of the states. The 
multiple measures would enable states to identify strengths and weaknesses and 
not just obtain a ranking on a global scorecard. The more inclusive set would 
encourage a broadening rather than a narrowing of the curriculum by calling 

Despite the desirability of having multiple scores corresponding 
"indivisible curricular elements" for purposes of identifying strengths and 
weaknesses and planning chang-^s in the curriculum, such scores clearly will not 
satisfy the demand for a overall number in reading or a single score for 
mathematics. Global scores will certainly need to be produced, in part, because 
the amount of information would be too overwhelming for many of its intended 
uses if it were only reported at the level of indivisible curricula- element 
level, and, in part, because there is a desire, as Ambach (1987) has noted, for 
a scorecard. Global scores can, and undoubtedly, will be produced. The 
argument here is not that such scores should not be produced, but that the 
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ability to disagregate the results to more specific content areas should 
maintained. Hie disaggregated scores are needed to interpret the overall 
results and plan inprovement. 
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The Effectiveness of American Education 
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American political attention has turned with increasing 
intensity to the matter of educational quality. From the reports of 
commissions and panels to debates by presidential candidates, the 
focus on students, teachers, and schools growi sharper every day. At 
the center of concern is a deceptively simple question: How well do 
our schools prepare our students? 

It doesn't matter if the language emphasizes excellence, subject 
matter understanding, productivity, or competitiveness, the meaning 
of the debate is clear: Can we describe, judge and improve the 
effectiveness of public schools? 

Over the years, significant investments have been made in 
trying to answer these questions. Standardized achievement tests, 
educational program evaluations, teacher testing, and minimum 
competency tests for students all are thought to provide useful 
iuionnaiion to help make judgments about the effects of educational 
services on students. Many of these options have roots in the mid- 
sixties enactment of federal legislation to assist educationally 
disadvantaged students. This new legislation required that the 
federal government evaluate the effects of its efforts to provide 
compensatory resources for students. The legislation was directly 
responsible for the rapid development and growth of the evaluation 
field and for many scientific developments in the measurement of 
human performance. Through the ensuing decades, one or another 
particular version of evaluation or measurement was selected as the 
new solution for understanding school effectiveness, the options 
coming, it seemed, in overlapping waves. Remember? Different 
solutions included setting objectives and measuring student 
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performance, local stand^Tclizecl student testing, program evaluation. 
Scholastic A )titude Examination (SAT) score decline, state minimum 
competency examinations, teacher testing, state assessment, and "The 
Wall Chart," a national comparison of educational systems. None of 
these approaches were found to be wholly satisfactory, but, after the 
initial blaze of interest died down, none were retired either. Instead, 
our attempts to understand educational quality have resulted in an 
increasing set of measures and approaches designed to shed some 
light on the issue. But do they? Imagine that we could start over, 
fresh and unsullied by our prior measurement experience. What 
would be fair measures of the effectiveness of our educational 
programs? 

To answer this question, we first must decide what level 
of information we want. Making a judgment about all of American 
education and assessing the effectiveness of First Street School in 
your hometown require different levels of information. In the first 
case, we would look for common features of schools and curricula to 
base our judgment. When looking at a particular school, however, we 
can be much more attentive to the community chaiacteristics, the 
kinds of students attending the school, the particular goals of the 
school, and other special conditions. In both cases however, we 
simply want to know the following: 

What are the students learning? 
How well do the teachers teach? 
What is the quality of our schools? 

The public seems equally interested in the concrete accomplishments 
of local schools and the general descriptions of the educational 
system at large. 

Educators want answers to these questions. These answers 
should not simply describe the state of performance for students, 
teachers, and school administrators, but should ideally permit us to 
devise actions to make things better. We want information for more 
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than curiosity's sake; we want it to help us improve education. This 
desire to face and fix what's wrong requires that the information we 
collect gives us more than categorical "good" or "poor" labels. We 
need enough detail to guide our policies and practices. 

With th^ iscussion as preface, let's consider in turn questions 
of effectiveness that involve students, teachers, and schools. 

Student Learning 

Student learning has been traditionally measured by 
achievement tests. For public accountability purposes, teacher-made 
tests hav ^ never been regarded as sufficient. Rather, because 
accountability implies some sort of comparison, tests that have 
standard content and rather general applicability have been used. 
Without rehashing two decades of concerns about standardized 
testing, a few issues remain salient: 

• Standardized tests allow comparisons among schools and 
regions. They may, however, be somewhat insensitive to 
curricular and instructional variations. Because they 

are prepared to be of widest utility, standardized tests 
may omit areas of particular emphasis for particular 
schools. These tests provide information on only a 
narrow slice of school activities. 

• Standardized tests most often ask children to answer 
questions given in multiple-choice format. I believe this 
format greatly underestimates student performance. 

• Because of technical reasons used in test statistics, very 
small absolute differences (for instance, one test item) 
might mean an improvement of a "grade level" or so. 
Making inferences about educational quality based on 
these differences is a shaky proposition. 



Test performance still is, in that unfortunate phrase, the 
lom line for many who would assess the effectiveness of the 
schools. At this time, standardized tests are regarded by many 
policymakers as credible and objective. Achievement testing will not 
go away, and for good reason. Students and, by implication, the 
schools to which they go must be held accountable for teaching 
students and for attempting to measure what they have learned. 
Standardized tests are thought by many to be the best approach we 
have. 

But these, tests can be greatly improved. At the Center for 
Research on Evaluation, Standards, and Student Testing (CRESST), 
sponsored by the US Ofiice of Educational Research and 
Improvement, we are in the midst of a five year research program to 
improve the quality of testing for use in the schools. 

The precepts of our program, and the way we believe testing 
ought to be improved, fix on a small set of critical issues. In one way 
or another, our attention focuses on validity, or the quality of the 
information the test provides us and the degree to which wc can 
believe it. 



Validity. Validity of achievement measures has a number of 
components (See Baker and Herman, 1986, for a fuller discussion). 
One critical component is the degree to which the way performance 
is measured matches the mode in which learning best occurs. With 
the advances in cognitive science, we believe we can design 
measures that more productively represent the richness of learning. 
For example, we are interested in assuring that in mathematics, 
science, and history, students be given different wajs to demonstrate 
their competence, perhaps in multiple choice tests, perhaps in other 
paper and pencil formats, perhaps using computer dynamic displays, 
perhaps in writing. Many current testing formats developed out of 
convenience for the administration and scoring processes rather than 
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because they were the best ways to assess complex human 
understanding. One attribute of tests is that they often force 
students to give the first, quick response, rather than a thoughtful, 
reasoned answer. The balance between conserving the time spent on 
testing and providing enough opportunity for adequate thought is 
still unsettled. Perhaps a more diverse menu of testing approaches 
will increase the overall validity of our measures, and allow testing 
approaches to match better student propensities. 

A second validity concern relates to the content or subject 
matter of what is to be tested. One of the sadder outcomes of the 
behavioral objective movement and ot inquiry approaches of the 
early seventies was the attention paid to process at the expense of 
the content to which these processes applied. We have seen the 
pendulum swing widely on this issue during the last two decades. 
Given the popularity of books like Cultural Literacy (Hirsch, 1987) 
and the scandalous blanks and misunderstandings in our students' 
knowledge, we are again on the verge of another swing towards 
content. It's tempting to devise tests that can pinpoint such content 
errors. This time, however, we want to assure that :ve go well 
beyond identification or recognition of specific facts and concepts. 
We intend to integrate measurement approaches that wed content 
with sophisticated approaches to demonstrating understanding, such 
as complex essays. We at UCLA are developing the technology to 
score such essays reliably and relatively inexpensively. 

Third, we are interested in measures that can be related 
directly to instructional options. We should be measuring 
performance that schc is can affect. This means that, where 
possible, we should be collecting information about teaching 
practices, student familiarity with content, and so on, at the same 
time that we measure student performance. 

Fourth, our measures mjst be valid when individual and group 
difference are considered. Whether a test is fair is a psychological ar 
well as an empirical issue. We particularly want to assure that our 



measures validly assess strengths and weaknesses of our pluralistic 
student body in a way that contributes to their motivation to 
continue learning. 

Quality of interpretation. Even when student achievement 
is measu )d validly, the way such findings are interpreted makes a 
difference. Interpretation involves relating findings to other similar 
measures of performance, comparing findings to the performance of 
other similar groups of students or schools, analyzing findings in the 
light of previous performance to see the development of trends over 
time, or looking at performance in terms of some predetermined 
standard. Comparison to other student groups is the most common 
interpretation strategy. This comparison is the basis of "norms," or 
averages provided for many nationally standardized tests. In some 
state assessments, comparisons for student performance are 
provided by looking at the performance of students in schools of 
similar size and community location. N.ore recently, the federal 
government has reported the comparison of student performance on 
the SAT state by state, a specific approach to be discussed later. 

A central issue of interpretation is what is being compared. 
Are tests of individual students used to make comparisons among 
schools? What other information needs to be collected if such use of 
information is to be sensible? 

The first question for these sorts of comparisons is: "Is the 
comparison fair?" One shouldn't compare a small, stable suburban 
school with a central city school that has a high mobility rate. Given 
the increasing diversity of our students, comparisons n.\v must 
involve issues such as language in the home and length of time in the 
school in addition to the more usual socioeconomic measures. 

Other options have been the international comparisons, where 
we look at US students in comparison to those in other countries. 
While such comparisons might be useful is setting goals for our 
students, the inference remains that we should adopt practices 
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embedded in other cultures or in other constitutional, and more 
centralized, arrangements for education policymaking. Such an 
inference is probably unwarranted. 

Moreover, the bane of most normative comparisons is that half 
of the group is always beiow average, a status unacceptable to most 
educational policymakers. No one yet has figured out how all 
students can perform "above the average." 

To sum up, what should we want in student achievement 
measurement? 



More than one measure of the same phenomenon, such as 
reading comprehension (to allow for corroboration from 
different sources), but with no expectation that all 
students need to take multiple measures. 

More than one kind of testing format, such as multiple 
choice and written answers. 

Tests that give students adequate time to perform serious 
cognitive tasks. 

Tests that measure both the content (what) and the 
process (how) that students use to solve complex 
problems of understanding. 

Tests that can be analyzed to guide instructional 
planning. 

Test results that ire understandable, timely, and usable 
by teachers for instruction and planning (see Herman and 
Dorr-Bremme, in press, for a report of teachers* test use. 

Reports of test results that are fair to students, teachers, 
and schools. 
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the state or school district to set standards of this sort, conflicts have 
developed on a number of points. Rudner (1987) points out that the 
standards for many of these tests have been set very low. Lorrie 
Shepard, in a case study of the Texas Teacher Test (1987) describes 
how it might be possible to pass the test by being testwise rather 
than being skilled in the area the test was assessing. Ellwein and 
Glass (1986) infer from their case study that teacher testing is 
mostly symbolic and has very little to do with actually identifying 
deficiencies and improving instruction. Involved in many of the 
analyses of teacher testing is the question of when it should occur 
(pre-service? pre-teacher education program?) and to whom the 
sanctions should apply (the teacher? the degree-granting institution? 
the teacher training institution?). 

Student achievement as a measure of teaching. Using 
student achievement as a way to estimate teaching effectiveness is 
another approach. It seems like a reasonable tactic; aaer all, 
teachers ought to help students learn. Clearly subject to the validity 
concerns about student testing listed above, the use of such measures 
to assess teachers unfortunately adds new complexity. Minimally, 
these comparisons may nece.ssitate complex tracking of students who 
enter particular teachers' classes. Statistically equating students 
with different entry competencies is sure to be an unsatisfactory 
way to compare teachers* relative merit in promoting achievement. 
On the one hand, it's harder to teach students who have inadequate 
backgrounds. Alternatively, it's also difficult (because of artifacts of 
test^) to show real improvement when the student group comes in 
with a very strong achievement levels. In either case, the 
achievement tests will probably misrepresent the nature of the 
teacher's effort. Thankfully, recent assessment systems for teachers 
are attempting to represent more broadly the nature of teachers' 
efforts. 
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Educational Quality of Schools 



Who wants to know? The desire to find out how schools are 
doing is clearly legitimate, and educators, policymakers, and 
researchers continue to propose alternative sources of information. 
One of the problems we face is providing the right information to the 
right people. Congressional policymakers want to know whether the 
schools are woucing (Congress of the United States Congressional 
Budget Office, 1987). At different times, their concern may be 
focused on the quality of what is learned (as in the post-Sputnik 
period) or who is learning (when equity concerns are central on the 
educational agenda). Their needs are to assess the impact of 
resources they have invested and to target continuing or new needs. 
They need relatively unambiguous, clear information. To even a 
greater extent, state level policymakers are concerned with the 
effects of specific policies related to financing, curriculum, and 
certification, i.e., their efforts to reform schools in their states. Local 
school boards and their administrations have needs for information 
related to the quality of their policy implementation and the 
progress toward discretionary goals, given the particular 
characteristics of their community. Each set of policymakers has 
differential need for detail and different opportunity to influence the 
reality of classroom practice. The hodgepodge of conflicting 
information from local, state, and national evaluations doesn't make 
evaluation of educational effectiveness any easier. Some new 
approaches may offer some relief. 

Comparisons state by state. An approach under 
consideration by the federal government is to transform the 
measurement practices of the National Assessment of Educational 
Progress (NAEP) so that state-by-state comparisons may be possible. 
NAEP has been administering measures periodically to US students in 
reading comprehension, writing, and mathematics on a regular basis. 
At the present time, the administration of these measures allows for 
interpretation by broad geographical region, rather than for each 
state. The proposal calls lor administering these measures so that a 
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representative sample of each state would be tested and described in 
NAE? reports. The proposal also expands the number of subject 
matters assessed. If accepted, this approach could focus the 
evaluation of schooling on the NAEP achievement measures. Is this a 
good thing? There is a clear division of opinion. Let me review some 
of the arguments on behalf of and against this approach. On the 
positive side: 

• A common basis for understanding student achievement 
would be systematically available. 

• The quality of measures would continue to improve 
because of the salience of the measures. 

• States could use such information for their own policy 
assessment to check their progress. 

• Interpretation for policy purposes would be simplified. 

• States would be able to compare themselves to subsets of 
other similar states. 

On the negative side, critics contend that: 

• NAEP may turn into a national achievement test, and a 
national curriculum may follow. 

• NAEP will not be sufficiently responsive to local or 
regional differences in curricula, students, or economic 
factors to permit legitimate comparisons. 

• NAEP will drive out state and local tests, which are more 
responsive to local curricula. 

• The pressure for school district comparisons wili follow 
state comparisons. 



• Because NAEFs strength will be comparisons over 

time, the pressure to keep NAEP measures the same will 
inhibit new goals for the curriculum and new approaches 
to measurement. 

• A single set of measures can be wrong. Given the state of 
understanding of achievement measurement, investing in 
different assessment approaches is the most prudent way 
to collect policy relevant information. 

For each of these points, both positive and negative, there are 
counterarguments, and counter-counter arguments. If the problem 
were simple, it would already be solved. The attractiveness of a 
clearly understood, single set of measures for American education is 
strong, even when the validity of the measures for assessing local 
and state educational policies is questioned. The state-by str.te NAEP 
approach needs to be understood as an attempt to catch hold of what 
our schools are doing. 

Quality indicators* Another tack is the quality indicators 
movement (Office of Educational Research and Improvement, 1987). 
The goal of this effort is to identify and systematically collect 
information that can give a picture of the overi^ll quality of American 
education, not simply limited to achievement testing. Work in this 
area has been conducted by The Rand Corporation, the Center for 
Policy Research in Education at Rutgers, the Center for Research on 
Evaluation, Standards, and Student Testing (CRESST) at UCLA, and by 
numerous other institutions and scholars. Part of its impetus comes 
from the realm of economic indicators, v/here seemingly simple 
numbers like the Gross National Product, unemployment figures, and 
the Dow Jones Average efficiently communicate the economic health 
of the country. The Center for Education Statistics (a division of 
OERI), under the leadership of Emerson Elliott, has been working on 
indicators of educational quality. These indicators include figures 
such as dropout rates, per capita student funding, student-teacher 
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class ratios, enrollment figures, and the like. Problems encountered 
with this approach include the vastly different reporting approaches 
taken for something as understandable as dropout. Different 
districts and states count dropouts at different intervals, for different 
ages or grades, use different base rates, track student mobility 
differently, and so on. Getting everyone to agree on a single 
reporting approach, even for an "easily understood" concept like 
dropout, is a Herculean task. 

Outcomes like achievement test scores, college admission rates, 
or dropout figures represent the easy part of indicators. Quality 
indicators should also take into account input variables and measures 
of process. 

Imagine one wanted a "quality indicator" related to some 
intermediate process, such as student couisework. In fact, UCLA and 
The Rand Corporation are collaborating of the development of such 
indicators. We need to consider how to determine "quality" in a valid 
and comprehensive: way, how to collect such information accurately 
and comfortably m schools, and how to report such findings fo that 
the effects of educational reform can be tracked. If we (or others) 
can solve such a problem, educational achievement tests can be 
relieved of the perhaps excessive burden they carry as measures of 
the effects of different policies. Making changes, such as adding 
coursework requirements, strengthening the content of the 
curriculum in a particular area, or requiring textbooks to exhibit 
certain content standards, are all hypotheses that policymakers make 
about what will help schools. Indicators of the extent to which these 
policies are used is a fiist step; studying the relationship of the level 
of their use and resultant levels of student achievement in a second 
critical link. Yet, the indicator movement must be cautious about 
identifying a single magic index or number to stand for complex 
educational processes. As Leigh Burstein of UCLA points cut, the 
context in which such data are reported, understood, and interpreted, 
is central to the success of this effort (Baker, 1987). 
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Summary 

The search for approaches to assess schools, their teachers, and 
students, will continue. This discussion has touched lightly on a 
number of complex issues. Controversies will also continue, and we 
can be sure that almost any decision will be rethought sometime in 
the future. Our interest in the research community is to keep a few 
issues in front of the public and the decisionmakers in this area. 

First, we believe that the validity of any measure or indicator 
should be paramount, whether it is a measure of outcomes, like 
student achievement, of input, like teacher knowledge, or of 
processes, like student coursework. These measures should be 
designed in a way to allow multiple or flexible ways to demonstrate 
success for different students. These measures should help us to 
pinpoint and fix weaknesses in policy and practice. Finally, these 
measures first must serve the interests of students and improve 
their schools. We must overcome our habit of preparing measures 
for the convenience of test developers, administrators, legislators, or 
even teachers. Rafher, we need to consider the impact of our 
approaches to assessing educational effectiveness on our current and 
future students. 
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Further Thinking on the Merger of the National Assessment of 
Educational Progress and the School and Staffing Surveys; 
Summary and Recommendations from Two Meetings of 
Statisticians and Researchers 



Leigh Burstein 
Pam Aschbacher 
Center for Research on Evaluation, 
Standards and Student Testing 
University of California, Los Angeles 



This report summarizes the discussions from two meetings 
(held at the Center for Education Statistics (CES) , Wednesday, 
November 18 and Friday, November 20, 1987) to advise CES 
regarding the possible merger of the National Assessment of 
Educational Progress (NAEP) and the School and Staffing Surveys 
(SASS) . It also incorporates points from written statements 
provided by selected meeting participants and from other 
individuals whose advice was sought but were unable to attend. 
The report begins with a brief description of the background and 
context of the meetings. A summary of the main points of 
discussion and recommendations to CES follow. The latter are 
further illuminated by the written statements from participants 
(ITEM VI in the attachments). 

Background 

The meetings on merging NAEP and SASS were organized as an 
activity of the Quality Indicators Study Group of the Center for 
Research on Evaluation, Standards, and Student Testing (CRESST) 
at the University of California, Los Ancreles. The CRESST 
activity was in response to conflicting advice received by 
Emerson J. Elliott, Director of CES, and his request for 
assistance in obtaining fv ther thinking about the possible 
merger* 

As described at the outset of the meeting, the dilemma was as 
follows. CES has 2 major studies serving complementary purposes, 
both of which are state-representative. NAEP is a study of 
student outcomes and is newly state-representative (used to be 
only national-based sample) . SASS is a study of teacher demand 
and shortage based on state-representative data to be fielded for 
the first time in 1988 but with the intent to develop a time 
series on important school characteristics. Currently, the 
studies have different foci with respect to units of observation 
and analysis (NAEP focusses on students and their teachers and 
schools; SASS on teachers and the schools and districts within 
which they work) and consequently different sampling universes. 
In 1988 (the first year for SASS), the studies will be fielded in 
non-overlapping schools. 
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Reasons offered in support of integrating the two studies 
include: 

1) eacli may provide contextual data to better interpret the 
other ; 

2) the merger represents an opportunity to look at 
relations between two sets of data; 

3) there should be cost savings from reducing number of 
teachers sampled; and 

4) it may be possible to reduce overall respondent burden 
although the burden may increase on some of those 
sampled. 

On the surface, then, it seemed attractive to merge the two. 
In fact, in previous meetings, the Advisory Council on Education 
Statistics (ACES) has recommended that a merger of NA3P and SASS 
proceed. This ACES recommendation was consistent with the 
recommendations on linking data collections from the report on 
alternatives for a national data system on elementary and 
secondary education prepared by Hall, Jaeger, Kearney and Wiley 
(December 20, 1985). 

Yet other segments of the educational community raised many 
questions about whether the merger was a good idea based on a 
variety of technical, substantive, practical, and political 
grounds. These include 

— management concerns associated with two huge data 
collection efforts and the need to protect NAEP at all 
costs; 

— lack of sufficient prior experience with SASS to judge 
how this survey will be most useful; 

— questions of which background data should be related to 
student-achievement? how significant would this add-on 
of questions be? Couldn't this be done in smaller 
studies? 

CES has ^ad many meetings and written several papers about 
a NAEP/SASS merger (Cf. ITEM VII). Since CES needs to field the 
study in March 1988 there is need tor immediate input. Moreo^'er, 
at the time of the meeting, CES didn't havr any information on 
the overlap and "strain" projected from singulations of sampling 
for the three major studies (NELS, NAEP & SASS) . 

The purpose of the meetings was to bring together persons 
knowledgeable aoout educational research, statistical, and policy 
analytic issues (Cf. ITEM III for a list of meeting participants; 
other individuals were invited but were unable to attend) that 
CES's data collections (including NAEP, S\SS, Longitudinal 
Studies) to: 
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a. Consider the range of issues that CES had alreu .y 
identified and review its available documentation 
regarding these issues; 

b. Augment CES's prior analyses with other evidence that 
bears on the perceived benefits and costs of the 
proposed merger; 

c. Assess the likely consequences (e.g., for knowledge 
production, enhancing policy analysis capabilities, 
improving or degrading the quality of data ) of the 
merger; 

d. Recommend options with regard to the decision process on 
the possible merger and the steps that should be 
undertaken in advance of a final determination to 
proceed with the merger. 

Participants were provided in advance specific cpiestions and 
issues that the meeting was intended to consider (ITEM II) and a 
set of pertinent documents (ITEM VII plus copies of CES Working 
Paper 2, the Hall at al. report, the synthesis of invited papers 
from the Elementary and Secondary Data Collection Redesign 
Project, and the report from the planning conference to consider 
a merger of NAEP and NELS) . Three persons (Richard Jaeger, 
University of North Carolina, Greensboro; Richard Murnane, 
Harvard University; Marshall Smith, Stanford University) were 
asked to provide vrritten input even though they were unable to 
attend the meetings. 

Two 5-1/2 hour meetings were held, with a day in between to 
accommodate the schedules of the desired participants and to allow 
time to prepare information from the first day's discussion (ITEM 
IV) to assist the second day's deliberations. Meeting 
participants were also asked to provide written summary 
statements regarding their views on the merger. A follow-up 
letter was sent to all participants on Wednesday, November 24th, 
to provide an initial summary of the meetings' main points and a 
preliminary list of recommendations, and to encourage 
participants to submit written statements and inform them of next 
steps. As of December 10th, ten meeting participants (in 
addition to Jaeger, Murnane, and Smith) had provided such 
statements (See attached ITEM VI) . 

Summary of the Issues Discussed 

Despite the diversity of perspectives and interests 
represented in both days' meetings, there was considerable 
consensus about the basic issues that need to be addressed. 
These issues were ^c echoed in the written statements. 
Display 1 represencs an attempt to code the written statements 
with respect to their consideration of the major iF5='^<*s and 
support for the recommendations. 
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Display 1 



The main issues addressed were as follows: 

Issue 1^ What does "merger" mean and how comprehensive (wi th 

respect to instrumentation and to s amples) should it be? 

Discussion ; The merger of NAEP and SASS could occur in a 
variety of ways that vai y In the extent of combinations. Merger 
options discussed include: 

o (-:nplete merger — joint administration to the same sample 
wl schools on the same cycle (See Richard Jaeger letter) 

o Integrate the two studies in only some states 

o Merge the two infrequently (e.g. every 6 years, which could 
be accomplished by putting SASS on a 3-yr cycle and leaving 
NAEP on a 2-yr cycle) 

o Move (or repeat) a smal] set of items on school policy and 
teacher characteristics from SASS to NAEP in order to' 
explore certain policy issues, allow NAEP to get better 
information with its own sample, and alxow SASS to keep its 
own sample ai/ ^.urpose, (Note: this approach would not 
necessxtate including "high-burden" items from SASS in a 
revised NAEP.) 

o Use part of SASS in NAEP schools 

o Merge the two only at the national le/el. 

A repeated theme of th^ discussion and written statements is 
that "considerations of how/whether some elements of the two data 
collections might be usefully integrated should be examined 
carefully in the light of specific analytic benefits, respondent 
burden, data objectives, and periodicity of the data collections 
before a decision to seek merger" (Linda Darling-Hammond) . The 
primary rationales for the merger proposal were the analytic 
benefics from adding SASS data about districts, schools, and 
teachers to NAEP data on schools, teachers, and students and the 
efficiencies of data collection that might be obtained through 
using the same sample for NAEP and SASS. 

There was general s^ntiment for moving or readministering 
some SASS questions as part of NAEP and little or no support for 
complete merger, at least in the near future. However, how far 
to proceed nee-" d to be guided by the tradeoffs between the 
analytical purposes such a merger could serve and the possible 
consequences in terms of burden and costs of the particular form 
of merger. 



DISPLAY 1 . PARTICIPANTS* OPINIONS REGARDING ISSUES IN THE MERGER OF NAEP AND SASS^ 



ISSUES 

1 . Should Merger Occur 

A. Complete Merger 
in 1990 

B. Administer subset 
of SASS with NAEP 

0. Further Study regard- 
ing 1992/1994 needed 

2. Purpose of Merger 

A. Causal Analysis of 
school effects 

B. Policy Analytic 

C. Accessand 
Participation 

3. Topics Requiring further 
Study 

A. Conceptual Analysis of 
issues that Merged Data 
would address 

B. Emririr Midies * • ♦h 
existir i S/.i : .^ata 

0. Respo troe'i 
(Non-Cw .ation,' 

D. Costs of widrger 

E. Value of Design ot 
CoHDmon Sample Universe 

F. Incentives for School 
Participation 



PARTICIPANTS^ 
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4. Other Options and Iss jes 

A. Modify the Cycle for 

SASS Y Y Y ? y'> 

B. Field Parts of SASS 

on Different Cycles Y Y ? Y? 



f^JOTES: 

1. The codes are derived from a reading of the written statements provided by the participants. No attempt was made to reach a 
judgement based solely on comments during the meetings. Full text of statements are attached at Item IV. 

2. Participants are denoted by initials and are listed alphabetically. The Participants who provided written statements are- David 
Bayless (DB), Al Beaton (AB). Linda Darling-Hammond (LDH). Ed Haertel (EH), Morris Hansen (MH), Richard Jaeger (RJ), Tom 
Kerins (TK). Dan Koretz (DK), Richard Mumane (RM), Paul Sandifer (PS), Ramsay Seldon (RS), Marshall Smith (MS), Brenda 
Turnbull (BT), David Wiley (DW). 

3. The codes are as follows: 



Y: yes (affirmatively responded to the issue) 
N: No (Negatively responded to the issue) 
?: Statement may address issue but unclear 



Blank: Did no mention issue 
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Issue 2. What analytical purposes should guide air^ merger 
decisions? 



Discussion : Three sets of analytical purposes might serve to 
justify the merger: 

1. Causal Analysis of Effects of Schools and Schooling 

2. Policy Relevant Research Issues 

3. Topical Policy Issues 

There was considerable agreement among participants that 
"valid analysis/causal modelling" of school effects cannot be 
obtained through a merger of NAEP and SASS. The resultant survey 
would still be a cross-sectional one, c^nd a longitudinal survey 
such as HS&B and NELS is necessary to contribute to this purpose. 
Moreover, the merged NAEP/ SASS would encourage "invalid but 
potentially influential studies of school effects that could 
seriously distort policy" (Dan Koretz; statements from Richard 
Murnane and Marshall Smith make essentially the same point) . 

The ability to enhance policy analytic capacity (purposes 2 
and 3) , on the other hand, received considerable support. 
Improving the utility of both NAEP and SASS as indicator series 
was considered a valid and powerful reason for consideration of 
further integration of the two surveys. But such possible 
improvements should be attempted only if the integrity of 
NAEP as an indicator of student achievement trends v^as not 
threatened. The notion of using NAEP as a source of student 
performance data for SASS, fox. example, was not considered a 
sufficiently compelling reason to merge at this point, especially 
since there is as yet no history of tho functioning of SASS and 
its niche in the comprehensive education information system to be 
maintained by CES. 

The most discussed and agreed upon purpose for some degree 
of merger of NAEP and SASS was to enhance NAEP's usefulness in 
exploring equity issues. That is, the selective inclusion of 
SASS questions on teaching and schoolinq conditions could be used 
to examine differential student assignment to types of teachers 
and classes ("access"). It was felt that this analysis would be 
useful at both thv. state and national levels for both public and 
private schools. The following questions were taised about even 
this purpose, however: 

1) Would this information actually be used? Some states 
evidently have such information already and do not use it. 

2) Is it properly a federal task to provide such information 
on a state-by-state b*-sis? 

(Letters from Ed Haertel and Marshall Smith convey the issues on 
this point. ) 

Two other (non-causal) analyses were proposed briefly: 
linking student achievement to staffing variables, and linking 
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teaching strategies to staffing (Dan Koretz's letter provides a 
useful example here) . The national and state-level patterns in 
these relations over time were of interest to some participants. 



The message that came through loudly was that to date there 
has not been sufficient conceptual and empirical analyses of the 
specific analytical purposes to be served by integral-- net the two 
data bases. Analyses thus far have focussed large^ . :>*«• 
operational matters (logistics, respondent burden, custs* „nilc 
general and overly vague purposes remain the souri ^ ^ cii« urge 
to merge. Before proceedir.v, too far down tne rcdd c^ a wc-r'j. r 
of any consequence (other than the augmentation oZ VJSii NAE*' with 
a few questions from SASS) , further study of the specific 
analytical issues to be addressed through merger is essential. 

Another point latent in the discussion in this area was 
whether some of the substantive reasons put forward as a basis 
for the merger might be best served through special studies that 
parallel and piggyback on either NAEP or SASS. This type of 
linkage is suggested in the RAND Corporation's report to the NSF 
on alternatives for the development of a comprehensive 
information system in mathematics, science and technology 
education (Shavelson et al., 1987). Currently, certain bridgin^^ 
studies are fielded along with NAEP to address special topics. 
These are conducted on subsamples as part of the overall study. 
Many of the ideas that warrant special attention could be fielded 
in a subset of locales, for instance. One participant (Ed 
iaertel) suggef^ted that state assessments might be a viable of 
the outcomes data to augment SASS. In this case, these would be 
be special studies that would add little actual additional local 
burden, especially if linkage were carried out at the state 
level . 

There was considerable sentiment from the entire group that 
CES needs to encourage and commission conceptu^^l and empirical 
analyses from a broader audience to assist in rheir development 
of analytical purposes for integrated data collection. In 
particular, mechanisms that would encourage empirical studies 
with already collected NAEP teeJher and school data and with the 
1988 SASS data are critical if the possibility of a significant 
merger remains a consideration for the agency. 
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Issue 3. What are the likely consequences of merger alternatives 
with respect to respondent burden and costs? 

Discussion ; The participants expressed a concern that there 
are so many possibilities and so many assumptions and variables 
that affect cost, that the cost issues are not clear at all. For 
exsuaple, merging the two studies iright imply that all the data in 
the merged portion should be collected with the same method (i.e. 
personal interview or mail survey) • While it uay be expected 
that the NAEP interview method is less expensive, it might be 
• -rohibitive when used with a sample the size of the SASS. The 
choice among modes of data collection within the merged survey 
was viewed to be considerably important with respect to both the 
cost and burden issues (e.g., see statements from David Bayless 
and Morris Hansen) . In particular, the savings from merging SASS 
is slight given the current plans calling for a mail survey. 



A mrjor concern throughout the meeting was the effect of burden 
(actual and perceived) on quantity and quality of data. It was 
pointed out that these effects might be exacerbated over time 
when the data is repeatedly collected every couple of years (see 
David Bayless and Richard Jaeger statements) ; Most of the item 
overlap of the two studies falls on school administrators and 
teachers « However, district administrators would also perceive 
increased burden as the number of participating schools* or amount 
of participation by any one school within their district 
increases. The issue is more one of politics than loss of 
instructional time. 

Since districts differ so, it is expected that they may react 
differently to the burden of a merger. Some might elect to test 
a universe of districts since so many may be sampled, whereas 
others may elect not to participate at all. It is feared that 
most districts are small enough that the increased burden would 
discourage them from participating. It was agreed that the 1988 
data collection efforts in NAEP and SASS separately will provide 
some basis for estimating the burden of a partially merged study 
in the future. Many participants (in particular, David Bayless, 
Al Beaton, Linda Darling-Hammond, Ed Haertel, Richard Jaeger, 
Brenda Tumbull) strongly urged more systematic study of 
respondent burden options before proceeding with anything beyond 
a mild data linkage. 

The group discussed the desirability of providing some payoff 
to the districts and schools for participating; however, no 
individualized reports or products seemed appropriate given the 
sampling methods. In addition, it was pointed out that providing 
any individualized information to participants in a timely manner 
would also be difficult. While the question of app- opriate 
incentives was recognized, there was much divergence of opinion 
on how to respond to this concern. (Note, for instance, 
statements from David Bayless, Linda Darling-Hamniond, Morris 
Hansen, Richard Jaeger, and Paul Sandifer) . ' 
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Issue 4, How does the question of the desirable/necessary 

cycle/periodicity and timing of SASS (or parts of SASS) 
interact with the above? 

Discussion ; There was considerable expression of concern 
that steps be taken to reduce the data collection burden within a 
given year in some manner. Moreover ^ these concerns were 
typically linked with the question of whether it was necessary to 
collect all^ or any part^ of SASS on a two-year cycle^ especially 
with the possibility of augmenting NAEP with some SASS questions 
and saunple enhancements (given the differences in target teachers 
of the two surveys). Participants' statements most clearly 
articulating the issues here are from Linda Darling-Hammond ^ 
Morris Hansen, Tom Kerins, Paul Sandifer, and Brenda Turnbull. 
While an as yet undetermined *'core" data set may be needed every 
2 years, much of the SASS data could be collected less 
frequently. Hansen and Sandifer, in particular, urge that NAEP 
anc* SASS be administered in alternative years wh^^re a 2-year 
interval is considered to be essential. 

The notion of a 3-year or 4-year cycle for SASS had 
considerable appeal for a number of participants. Putting SASS on 
a 3 -year cycle would have the advantage of making it coincide 
with NAEP every 6 years, thus providing possibilities of 
obtaining some merged data without increasing the burden most 
years. A 3-year cycle would also allow additional time to 
analyze the 1988 SASS data before decisions regarding its next 
administration. Such a choice would also postpone the merger 
decision to a point beyond the first planned comprehensive state- 
level data collection. 

It was pointed out that collecting some of the SASS data less 
frequently might provide some funds for collecting other data 
(e.g. collecting SASS district data every 4 years and collecting 
finance data on alternate occasions) or for conducting some 
special studies (See Darling-Hammond, Kerins, and Turnbull 
statements). Here, again, special studies using existing data 
bases were considered essential, and some means needs to be found 
to ensure that thuy are conducted. 
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Issue 5. What sets of analytical exercises/ special studies should 
be undertaken to address the merger issue in both the short run 
and the long run? 

Discussion : There are so many unknowns — changes in NAEP in 
the future and SASS being completely new — that it is difficult 
to think precisely about what would happen if the two were merged 
in some way. One important step is to use the 1988 fielding of 
both as a pilot for each separately that will provide some basis 
for estimating the consequences of any sort of merger. It was 
agreed that adecpiate preparation for a merr^ir would entail 
postponing the merged data collection until at least 1992-1994, 
especially given the amount of lead time necessary for 0MB 
clearance and so forth. Several additional suggestions were 
made (virtually all participants had specific suggestions for 
specicil studies) : 

a) CES should do some futures projections to see the costs 
and consequences 

b) There should be some small analyses contracts to look at 
the NAEP "public useful" tapes regarding the analytical 
value of merging parts of SASS with NAEP. 

c) It would be useful to consider the various augmentations 
and analyze the incremental value of one over the others. 

d) Possible pilots for the merger could involve only one or a 
small number of states, or perhaps only merge first at t'le 
national level. 



One proposal for a special study that warrants special 
mention dealt with the development of a common sampling frame. 
David Wiley suggested that CES might con^ .der giving up current 
NAEP and SASS sampling frames and design a new one to integrate 
both (e.g., in 1992 or 1994). Then subdivide students and teachers 
according to whether they are in the NAEP sample universe, and 
draw separate subsamples and collect some linked data. This idea 
might be particularly valuable when NAEP becomes state- 
representative. Other participants thought that a feasibility 
and cost study of this idea would be worthwhile. However, it 
would make the most sense when less than the full SASS is 
fielded. There was a belief that the 1988 experience with the 
heavy burden in the field without merger might be informative on 
this matter. The question of mail vs. in-person survey would 
also impact on this decision. 



The broad outlines of the recommendations from the meetings 
are evident from the discussions of the issues and the statements 
provided by the participants. The recommendations that achieved a 
general consensus from the meetings and written statements are: 



Major Recommendations 
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1. A coiriplete merger of the questionnaires and samples from 
NAEP and SASS should NOT be attempted in 1990 . The risks of 
overburdening NAEP in 1990 are too great; Moreover, too little is 
known about how SASS will actually function at this time to 
assess the benefits and consequences of strong ties with NAEP. 

The group consensus was that a complete merger (joint 
administration on same cycle in same sample) is not feasible in 
1990 and probably is not a good idea anyway. The purposes (and 
the samples) of the two studies are legitimately different and 
should be preserved. Although it might be possible to define a 
common sampling frame, this approach might be quite inefficient 
and might have very negative consequences among schools and 
districts due to its perceived and actual burden. There was 
interest, however, in the possibility of a par^tial merger based 
on the desire to explore the issue of student access to teachers. 

2. Whether NAEP and SASS should merge in 1992 or 1994 
warrants further study including analyses of existing data from 
the two surveys gathered through the 1988 data collection. 

3. Regardless of the extensiveness of the eventual merger, 
the analytical purposes that should guide merger efforts should 
be those dealing with informing policy analysis rather than 
enhancing capabilities to conduct school effects or effectiveness 
research in an integrated national or state-representative data 
base. Examples of policy analytic purposes that could be served 
through a "merger" effort are the gathering and maintenance of 
national (and perhaps state representative) indicator series 
dealing with questions of access and participation (e.g., which 
kinds of students receive instruction in which kinds of r.chools 
from which kinds of teachers?) 

4. For the short term (e.g. , 1990) , a small set of teaching 
and schooling conditions guestions selected from SASS could be 
administered with NAEP to enhance its ability to serve polic y 
analytic purposes . To this end analytical work using past NAEP 
collections of teacher and school character: '^tics as well as 
other efforts to identify specific policy an^^ytic purposes to be 
served should be carried out in time to modify and augment the 
1990 NAEP school and teacher characteristics questionnaires. 

5. A three-year or even a four-yea r cycle for the major SASS 
data collection should be considered with at least part of the 
resource savings shifted 'to conducting special studies (e.g., 
longer term study of flow of teachers into and out of the 
workforce for a panel of schools and districts; augmentation of 
NAEP data collection in 1990; studies of the consequences of the 
intensity of respondent burden and costs consec[uenc3S of major 
merger) . Alternatively, the SASS instrumentation can be broken 
up into smaller sets which could be fielded on different cycles 
with perhaps a core set maintained on a mc ^ frequent cycle. 
Spreading out the SASS cycle would also : itpone collection 
activities in ways that would place less strain on plans for the 
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1990 NAEP. 

6. Postponing major merger discussions b eyond 1990 provides 
time and resources to consider (through design and other special 
studies) the costs and benefit s of developing a merged sampling 
universe across the major dat^^collections (including NELS as 
wel] as NAEP and SAS) . 

7. Attention is needed to the benefits accrued at 
the school level from participating in these surveys. 
••Contributing to national well-being^^ is an increasingly weak 
incentive given the extensiveness of data collection demands and 
competition from data collection with greater extrinsic rewards. 



* -k -k -k -k 



The above summary and recommendations convey the tenor of 
the discussions and written statements. Participants were 
genuinely concerned that the primary purposes of NAEP and SASS 
not be sacrificed or damaged by a hurried decision to merge the 
two. CES is undertaking major modifications and extensions of 
its data collection responsibilities over the next few years. 
Its efforts to date are commendable and the general direction of 
agency was viewed positively by the participants. Nevertheless, 
under the circumstances of major changes in responsibilities, 
operations, resources, and staffing, time and resources devot'^.d 
to further study that enhances the likelihood of fielding and 
reporting these collection efforts in an effective and c -edible 
manner is critical. Discussions of mergers of these data 
collections need to proceed at a more deliberative pace than at 
present. There is just too much at stake. 
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CRESST QUALITY INDICATORS STUDY GROUP 

Report from Meetings o- ^ES Merger of NAEP and SASS 

ITEM I 

Sample Letter of Invitation to Participate 
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Dr. David Wiley 
227 Sheridan Road 
Kenilworth, IL 60043 

Dear David: 

T^^^^.yo^ fo^^ agreeing to participate in the examination of 
technical issues in the possible merge? of the National 
Su?^evTL?f ^^^J^^i^^^l Progress (NAEP) with the School And Staff in 
ac^viti'n? i; n^t"" examination is being conducted as an 
??nHoi^^^ ^ ^^"^^^ ^^"^ Research on Evaluation, Standards, and 
Student Assessment's (CRESST) Study Group on Quality Indicatirl 
to assist the Center for Education Statistics (CES) in iJs 
deliberations of the merger question. The CES Advisory Council 

educlt?o:^?"o^' ^^^^ ^!;^ "^^"^^^ P"^^^^- segments o^ ^h^ 

educational community have questioned the advisability of the 

llVm^l ^r^unlr^ °' technic, subst.nUve, pr.ctij»l, ,„a 
The purpose of this examination is to: 

a. Consider the range of issues that CES has alreadv 
Identified and review their available documentation 
regarding these issues; 

b. Augment CES's prior analyses with other evidence that 
bears on the perceived benefits and costs of the 
proposed neger; 

c. Assess the likely consequences (e.g., for knowledge 
production, enhancing policy analysis capabilities 
improving or degrading the quality of data ) of th4 
merger; 

d. Recommend options with regard to the decision process on 
the possible merger and the steps that sh-uld be 
undertaken in advance of a final determ: ation to 
proceed with the merger. 

The plan of operation for the present activity is to seek 

sf^^S? in.!':'? "^^^ ''^^^ respect to these issues and a specific 
set of querrions re9ar.-:.ing the merger (see enclosed). The 
pr.iroary me. sr: will be through two meetings to be held at the 
Center for fiucation Statistics in Washington on Wednesday, 
ERIC^^ ' ^"day, November 20, 1987. In addition 



written reactions will be obtained from a select set of 
individuals urable to attend either meeting (their written inpuc 
will be due by November 30, 1987). Participants will include 
researchers and policy analysts knowledgeable about the 
examination of educational effects through large-scale data 
analysis, experts in survey sample design, and representatives 
from national, state, and local organizations with an interest in 
analyses of education and the conduct of major survey data 
collections in the schools. The discussions at the meetings a.id 
the written reactions will be synthesized into a set of 
recommendations to CES about viable next steps and their possible 
cons, equences . ^ 

We are able to offer a modest honorarium and travel expenses 
for participating in the scheduled meetings. Included for your 
completion and signature is a Consultant Agreement. Please 
return the signed agreement in the enclosed self -addressed 
along with your current vita. 

We will be contacting you shortly to assist in travel 
arrangements and local hotel accommodations and to notify you 
about the exact schedule and location for the meetings. 

A i of papers and reports that serve as background reading 
for the scussions is enclosed. At this point the short issues 
papers ai. Working Paper *2 perhaps represent the most pertinent 
If your n .ding time is restricted. 

Thank yoj in advance for your willingness to participate. 



Sincerely, 

Leigh Burstein 
Co-Director, CRESST 
Quality Indicators Study Group 



Eva L. Baker 

Co-Director, CRESST 

and Co-Direct CRESST 

Quality Indicators Study Group 
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CRESST Quality Indicators study Group 

Meetings on CES Merger of NAEP and SASS 
November 18 & 20, 1987 

Questions and Issues 

1. What analytical advances are afforded from the combination of 

the samples for NAEP and SASS? ujAnation or 

^* Tn^Jr.^^??^*'^ 1° enhancing the analy-is of schooling? 
instruction? teachers md teaching? 

b. Doer, the existence of a national sample consequentially 
enhance the analyses identified in a.? 

c. Would the existence of data on a state-by-state basis 
consequentially enhance the analyses identified in a.? 

d. Do the presumed advances represent a unique opportunity 
or simply augment existing efforts (e.g.?) in a 
significant way? 

e. What are the consequences for other data collections 
designed to address related issues? 

2. Is it possible to merge the two national samples without 
adversely affecting the quality of the data to address the 
primary questions the data sets were designed to examine? 

a. Will the resultant respondent burdan compromise the 

data for assessing educational out mes from 
NAEP and schooling conditions from SASS? 

b. Will the compromises in sample selection and design 
consequentially impact each of the separate collection 
efforts? 

3. If the decision were made to proceed with the combination, 
how would one carry it off given the distinctions in U.e 
primary purposes and sampling between the two studies? 

^* ^^^^^^ phasing in the combination. 

Keeping in mind the planned expansion of NAEP in 1990? 

b. What set of special studies, pilot studies, and 

simulations should be carried out before a final decision 
to proceed with the merger (re. pilot test 1989)? 

d. What is a reasonable timeframe in light of data collection 
cycles for conducting studies of the merger before a final 
decision is made? 
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CRESST Quality Indicators Study Group 



Meetings on CES Merger of NAEP and SASS 
Participant List 
Wednesday^ November 18, 1987 

Non-CES 
Pain Aschbacher, CRESST, UCLA 
Eva Baker, CRESST, UCLA 
Anthony Bryk, University of Chicago 
Leigh Burstein, CRESST, UCLA 
Joe Conaty, OERI 
Morris Hansen, WESTAT 
Dan Koretz, Rand Corporation 
James McPartland, CSOS, Johns Hopkins 

Senva Raizen, National Research Council, National Academy of 
Sciences 

Paul Sandifer, South Carolina State Department of Education 
William Schmidt, Off ice of Studies and Program Assessment, 

National Science Foundation 
Ramsay Selden, CCSSO State Education Assessment Center 
Brenda Turnbull, Policy Studies Associates 



CES Staff 
Emerson Elliott 
Jeanne Griffith 
Anne Hafner 
Carrol Kindel 
Don Malek 
Eugene Owen 
Mary Papageorgiou 
Gary Phillips 
Paul Planchon 
Iris Silverman 
Nancy-Jane Stubbs 
David sweet 
Doug Wright 
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CRESST Quality Indicators Study Group 
Meetings on CES Merger of naep and SASS 
Participant List 
Friday. November 20, 1987 

Pam Aschbacher, CRESoT, UCLA 
David Bayless, WESTAT 
Al Beaton, NAEP, ETS 
Leigh Burstein, CRESST, UCLA 
Joe Conaty, OERI 

Linda Darling-Hammond, RAND Corporation 

Ed Haertel, Stanford University 

George e. Hall, Slater I!all Information Products 

Tom Kerins, Illinois state Department of Education 

Sally Kilgore, OERI 

Doris Redfield, OERI 

Ramsay Selden, CCSSO state Education Assessment Center 
W. Ray Turner, Dade County Schools, Miami, Florida 
David Wiley, Northwestern University 



CES Staff 
Emerson Elliott 
Jeanne Griffith 
Carrol Kindel 
Don Malek 
Eugene Owen 
Mary Papageorgiou 
Gary Phillips 
Paul Planchon 
Iris Silverman 
Nancy-Jane Stubbs 
David Sweet 
Doug Wright 

Participants Providing Written Input Only 

Richard Jaeger, University of North Carolina, Greensboro 
Richard Murnane, Harvard University 
Marshall Smith, Stanford University 
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CRESST Quality Indicators Study 'Group 



Meetings on CES Merger of NAEP and SASS 
November 18 & 20, 1987 

MAIN POINTS IN DISCUSSIONS 
WEDNESDAY. NOVEMBER 20. 1987 

The group attending the meeting on Wednesday considered the 
original list o.f questions and Issues that were distributed in 
advance of the meeting . The aain points raised in those 
discussions Included the following: 

1. What is meant by a "merger" of NAEP and SASS is subject 
to a variety of Interpretations. Strong merger Implies joint 
administration on a repeating cycle in the same sets of schools. 
Weak merger can be accomplished In a lot of ways with the most 
benign and obvious being move toward comparable wording where 
current Intents overlap and inclusion of additional SASS-type 
questions within the NAEP schooling conditions data collection. 

2. There was a strong commitment that the primary purposes of 
NAEP and SASS sho ild be preserved at all costs. Any risks to 
those purposes shoold be avoided. Preserving the outcome series 
from NAEP nationally and establishing the teacher characteristics 
and flow series (on a state basis) were considered to be of 
greatest Importance. 

3. strong merger of NAEP and SASS for the primary purpose 
of improving the relational analysis of the impact of schooling 
conditions on student outcomes would be a mistake. Basing 
relational analysis of the ca-sal effects of schooling conditions 
on cross-sectlor tl studies Is a bad idea (misleading is the mild 
form of the criticism; longitudinal studies are essential for 
such analyses. 

4. Imp-ovlng and modifying NAEP data collection in the 
schooling domain to provide better "descriptive" analyses of 
trends Is potentially of benefit as Is the possibility of 
presenting evidence on the relation of student characteristics to 
the characteristics of the school conditions they receive. But 
more preliminary investigation is needed to determine just what 
types of enhancements in the descriptive capacity of NAEP are 
worthwhile. Moreover, while there may be some justification for 
national samples for such purposes, the additional benefits of 
state-level samples for these purposes are less -^lear. Support 
for this point implies enhancing NAEP's data col. jctic without 
moving tovra'-d major merger. 

5. There was much sentiment for modification of the 
"perceived" plans for the administration cycle for SASS rather 
than pushing toward strong merger. The primary argument was the 
plans (and presumed strong merger) would force more frequent 
fielding of SASS than is viewed to be necessary for its primary 
purpose. Expanding the period between administration of SASS was 
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strongly recommended by some participants. Administering SASS 
out of phase with NAEP was also proposed on the grounds of the 
potential respondent burden. While the 1988 fielding of SASS is 
firm, there was some support for going at least 3 years before 
repeating this co-tlectlon. Besides the concern for burden, there 
was a strong Interest In fully developing what is a new 
Initiative without complicating both It (SASS) and NAEP (assuming 
state level data collection In 1990) . 

6. More attention should be paid to planning the kinds of 
special studies that would Inform decisions down the road about 
data linkages than to the push for 1990 merger of the main CES 
collections. Such studies should Include Investigations of the 
respondent burden from more Intensive collection within the 
cross-sectional survey?? (Implicit In the NAEP-SASS strong 
merger) . 

7. More attention should be paid to the questions of 
benefits to participating districts, schools, and teachers. 
Argtiments of intrinsic merits of serving national Interests art 
Insufficient in light of competing data collection burdens. 

8. The question of partial paneling of SASS and perhaps 
NAEP needs further exploration. 
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November 24, 1987 



TO: Participants, CRESST NAEP-SASS Merger Discussions 
FROM: Leigh Burstein 
RE: Next Steps 

Now that I aiP back, I want to take the opportunity to thank 
you for your participation thus far in the CR?SST-sponsored 
Jin^J^iK^Jr.?" "®^9ing NAEP and SASS. The sense I have gotten 
if?? CES and some of you is that the meetings went very 

well. The major issues were aiied and received thorough, if not 
always extensive, discussions. I feel confident that we will be 

S provide CES with a set of recommendations that can assist 
their decision process. 



for i^«„r^ discussed at the end of both meetings, this timeframe 
for input from this activity is very short. It was agreed that I 

n™™Hof ?JKP''®r"^!.^°" ^° Advisory Council on Monday, 

December 14th. To this end, I urge each of you who attended 
either meeting (and those who did not as well) to provide me a 
hJ^^h.ifnL,^?^®^ statement regarding your summary recommendations 
by Thursday December 3rd at the latest. This statement could 
address the issues and questions ac raised inj'cial.ly, various 
points that came up during the discus?.ior.s, or ideds you had 
reflecting on the discussions. The summe-rv of Wednesday's main 
points that was distributed on Friday and blck Murnane's letter 
are enclosed to assist you in this next phase. 

wH,- I ^^^^ ^ provide a brief summary of 

What I thought occurred during the meetings. There was 
consistency in the issues discussed during the two days; my 
quick, rough list is as follows: ^ 

1. What does "merger" mean and how comprehensive (with 
respect to instrumentation and to samples) should it be? 

2. What analytical purposes should guide any merger 
decisions? j ^ 



3, What are the likely consequences ' alternatives with 
^ respect to respondent burden and costs? 



NAEP/SASS Merger 
November 24, 1987 
Page 2 

4. How does the question of the desirable/necessary 
cycle/periodicity a. id timing of SASS (or parts of SASS) 
interact with the above? 

5. What sets of analytical exercises/special studies should 
be undertaken to address the merger issue in both the short run 
and the long run? 

, My sense was that while the emphases on the two days 
differed considerably, there was a general consensus that 

1. A major merger of the questionnaires and samples from 
NAEP and SASS should not be attempted in 1990. Whether such a 
merger should occur in 1992 or 1994 warrants further study 
including some basic analyses of existing data from the two 
surveys gathered through the 1988 data collection. 

2. Regardless of the extensiveness of the eventual merger, 
the analytical purposes that should guide the decision process 
should be those dealing with informing the policy analytic 
process rather than the enhancement of capabilities to conduct 
school effects or effectiveness research in an integrated 
national cr state-representative data base. Examples of policy 
analytic purposes that should be supported through any "merger" 
effort are the gathering and maintenance of national 

(and perhaps state representative) indicator series dealing with 
questions of access and participation (e.g., which kinds of 
students receive instruction in which kinds of schools from which 
kinds of teachers?) 

3. In the short term, careful consideration should be given 
to drawing from the SASS instrumentation teaching and schooling 
characteristics and conditions questions that would enhance 
NAEP's ability to serve policy analytic purposes. To this end 
analytical work using past NAEP collections of teacher and school 
characteristics as well as other efforts to identify specific 
policy analytic purposes to be served should be carried out in 
time to modify and augment the 1990 NAEP school and teacher 
characteristics questionnaires. 

4. Certain functions of SASS do not require two-year cycles, 
A three-year or even a four-year cycle for the major data 
gathering of SASS should be considered with at least part of the 
resource savings shifted to enhancing certain special studies 
(e.g., longer term study of flow of teachers into and out of the 
workforce for a panel of schools and districts; augmentation of 
NAEP data collection in 1990; studies of the consequences of 

the intensity of respondent burden and costs consequences of major 
merger) • 
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5. Postponing major merger discussions beyond 1990 provides 
time and resources to consider (through design and other special 
studies) the costs and benefits of developing a merged sampling 
universe across the major data collections (including NELS ^'s 
well as NAEP and SAS) . 

There were other points that might appear in the summary 
recommendations I will prepare for the CES Advisory Council and 
circulate among participants. We will also prepare a longer 
report from the meetings. 

My plan is tc draft the summairy recommendations for the CES 
Advisory Council and circulate them to you (along with copies of 
the written statements from Participants) by December 10th. Any 
suggested che "jes will need to be offered i'^mediately to impact 
the version to be presented to the CES Advisory Council on 
December 14th. I also expect to attach the written statemerts a 
appendices to the summary rer Dmmsndations unless there is 
objection. 

That's about it for now. If you have any thoughts c the 
above and would like to discuss them *it.h me, please call. I vil 
be in town through December 6th (213-825-1889; 818-883-9185). 
Thanks again for your participation. 
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1 . 


.-:vid Bayless 


Weststat 


2 . 


M^ert E. Beaton 


NAEP, ETS 


3. 


Linda Dariing-Hammond 


The RAND Corporation 


4. 


Edward haertel 


Stanford University 


5 . 


Morris Hansen 


Westat 


6. 


Richard Jaeger 


University of North Carolina 


7 . 


Tncmas Kerir.s 


Illinois Board of Education 


8. 


Daniel Koretz 


The RAND Corporation 


9. 


Richard Murnane 


Harvard University 


iO. 


Doris Redfield 


OERI 


11. 


Paul Sandif^r 


South Carolina Dept of Education 


12. 


Marshall £ nith 


Stanford University 


13. 


Brenda Turnbull 


Policy Studies Associates 
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Merger of NAEP and SaSS 
The Relationship of Risks of Non-Cooperation 
to Level of Commitment and 
Total Level of Data Collection Burder 



by: 



David L. Bayless 
Westat, Inc. 

.h. «™nr^°' '° large-scale national study is the .-ejected lack of cooperation of 

IctivftL of ThT^nl'r"' °^ educational system if the data collection 

1^2 ^ '^"'"^ burdensome or there is a lack of commitment. The educator's 

olTlZTlV° T'^' instructional services which is in natural conflict with 

or an otetacle to prov.dmg assistance in the collection of data for a research stuoy Political 
ph.Iosoph.caI and other factors also contribute to the lack of cooperation. 

partici^ali!?.! l^rT,, 1"^'?- •""'"^ °' tduczior's decision ,n 

part.c.pate .1^ the data co lect.on act.v.t.es of the study It is hypothesized that th risk of non- 
i^iSIi^ ^. related to: (!) the kv,l gf .pmmjtmrn, to the study felt by he educ tors and/or 
furren. «'h"?T;'-^'-' ' ^^0-^^15), ^ (2) the total level nf H,:, rLr, :.. ^,111 
?he tn. h 1?? ""r"^ "-nplcmented. The total level of data collection burden ^s measured by 
\\m^^fi:^t^J°\ f '° ^"P*'"*! «° 'he study instruments and the onerational 
SJL K H r "'^''■^ """P'* P'"* the total level of data 

of s commn/d" ?n ^ <'i""«. school) Is 

univershv T.^? N ?• ' j'''""'^" achievement testing programs, other 

nelZ7?n „^ . h"u' 'I!"'' '^'though considerable work is 

l?vH J th« theoretical relationship between the risk of nnn.ronn.r.,;.n and ,hc 

SifJ^^Sl^rif '^v^' or <1,,n rollrgjon i>vrd;n.\^ed upon my practica 

imrortan. .r.„H;' "'^^r'^' ' '^is relationship is 

important to understand fesfett considering whether NaEP and SaSS should be merged. 

^mn^r^^f-'^J"'*^ '^T""^ gathering Jata on this relationship so that 

^T^A-<^ f'nd'ngs can gu.de the planning of mergers of large national studies such as NAEP 

?»1 .H^..,.-„ ."^ ^'"^''''t.'*•'*'°"'''P• extraneous or blocking factors such as sectors of 

diLict rm!l.r'^ ^'""'J r^'''^ °' educational system (National, sta.e, 

fncoro^^^^^^^^^^ secondary (schools, teachers, and students)! are factors that should be 

ixperimel hn ,h °^ ^"'K"' ^"^h the design of 

coorr^l ' H ' ^°"*'<l«^«d .n the study of the relatio.iship between risk of r.on- 
coopcrat.on and comm.iment and burden. 

rela.ioIr,h!n'or'T" °5 "r''^^ instruments fo measure and collect data about th* 

buVden 1 orr^r^J r.sk of non.cooperat.on to fhe level of commitment and data collection 
ron?H;J.- follow.ng comments and observations that CES should take account of 

cons.dcra:.ons concern.ng the potential merger of NaEP and SaSS. 

From the beginning of NAEP (1969) to the present day, che priority for the NAEP 
oScdtl" " National estimates (not statcby state estimates) ^o'r the nation anS 

SSrv at t"he M fr?.""; '"T"".^ ^^^P^'''''^'' ^'th the NAEP survey has been 

JSS NAPP d'*'"^t »nd school levels. Natural confiicts between the data collection 
burden of NaEP w.th the burden of other National, state, and local data collection activities has 



existed and w.ll continue to exist. The level of commitment by educational executives to NaEP 
has been adequate jrimarily because the data collection burden has m beer, excessive Under 
the proposed sampling plan in 1988, 44 percent of the states will have over 50 percent of their 
districts in at least one of the National data collections of NAEP, SaSS and NELS It is mv 
prediction this increased data collection burden will raise the risk of non-cooperation and will 
affect the quality of the collected data. Let me illustrate this view in relation to the data 
collection activities of SaSS. 

The data collection method for SaSS of l*;88 is to be cona icted via a mail survey which 
will add an^extra data collection burden to the schools, (e.g.. school personnel will expend time 
A°. "*u° !l .^^^ collected), which in most cases is a £2Si to the local school system. 

Also the data collect^n burden is at a very high level in terms of the number of sample 
members selected. Concern has to be expressed about cooperation or the risk of non- 
cooperatioii t?CYgnd mi in important National studies such as NAEP and NELS where much of 
the w^r^tignal b^r^lcn to collect the data is conducted by a person external to the school. 
Damage could be done to quality of the data for these other studies in future years. 

If a priority is to maintain the National data collection activities of NaEP and/or SaSS as 
^r'^** cooperation rate and data that conform to strict statistical data collection 
standards (quality), then only those states whose level of commitment is high should be invited 
to piggyback onto the NAEP and/or SaSS sample. States whose level of commitment is low 
af.d/or Whose total data collection burden (e.g.. large state and/or local assessment programs) is 
large should cfll be a part of the state level NAEP or SaSS studies. Such a plan would reduce 
the natural confii. t that exists between the National data collection activities and the state and 
Oistrict data collection activities and improves upon the "volunteerness* of the data collections 
tasks to the educational unit at the state, district, and school level. 

If statc-by-state estimates are required for NaEP and/or SaSS. then, in my view, to 
maintain high appropriate cooperation rates and d-ta quality the data collection activities need 

L^lZlll^^u ""k-"*?*'' V*"!^ assessments (e.g.. study participation is 

C21 voluntary). If this ,s to be the case, then I would strongly recommend that concerned state 
and local school of Acials be an integral part of making the data collection activities a legal 
<.*!?.;i'Tf"'' ? ^'hould research this issue by assessing the preferences and opinions of the 
state district and school officials (both private and public) in 1988 as to th- practical concerns 
about a legally mandated data collection at a level that will provide separate data by states 
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On Merging naep and SASS 



Albert E. Beaton 
November 25, 1987 



Progress'(NAEPrand th^^^rL'?' '^f Assessment of Educational 

rcogress (NAEP) and the School and Staffing survey (SASS) would 

cne data base of the other. However, the details of such 3 
S:c?LV important because a rush to merge might result in 

?nSiviSL:rs;^v'eyr^^°^ ^"^^^^^^^ - °^ "ie 

it wouM'^^e^%^f.n''^o^!;^ purposes for the merged deta. For example, 
teache^f of MnM have more detailed information about the 

t^rhfr! ^ ^ performing science students and about the 

t would aur£ris%f'?': '^^"^^ performance hi,h. 

who hav^ hinhi "seful to know more about the teachers of students 
^eac^^ .ni^ J V educated parents. Information about 

teacher and student attributes is useful in describing how 

so«r^"%"" "i^°"'"'^- ^''P^^^ NAEP repo to include 

some teacher information, but more would be better Grange tho!. 
who analyze such data will have to be careful Sot to a"Jr te 
causatxon to relationships found in survey data? bu? tin< 
tl tllt'/tZ !'^Pl°^i"9 ^-^--y d-ta may le^d to hypo hes" 



b«. talt^A --pxw.x.iy ;,utvey aata may lead to hypotheses aich can 
be tested by appropriate expe.iments. 

iohn.«^ ^° i^^' ^'^^ teacher data have been used by Longford 

Johnson, and Kxng to explore the question of the amount of Itudent 
moder"TJ"°^"?"'' teachers\nd schools using a mu^ti^leSeJ 

sCch IhI ""^i presented soon. Multi-level models 

?his sLdy and llll^rV' ""''k'" ^-ongford. which was vsed in 

relationfhlnrl? : ^^w" substantial promise for exploring the 
perJorman^e!^ characteristics and student 

the ft,tulJ''l^''^ ""''^ °^ present NAEP teacher data in 

d^L hie k" ^^<=°^^^9e this use. and other uses, a sample of NAEP 

treoJild tr^.?^*""^ °1 * '^°PPy ^^^'^ ^"'^ * is being 

disk as wi?i !i^^T?f'^ analysts use the NAEP data on the floppy 
detai? ho! J ""^^ ^"i^ ^^'^^ "rhis Primer shows in ^ 

tSp fJr ? ^"'^ "se the student and teacher data; in fact 

data io? h'^'^'rr' °^ ^^°PPy disk appended the teacher 

seJtou^v in f^^^J^^ts to explore. My belief is that NAEP would be 
seriously hurt if no teacher data were available. Tne extended sa<;s 

:Ue°:d"J^ieair°S!l^°"^"^'^ '''' teacherdan':^?^^ f^f 

to work oue "^l^ti^^ SASS may be difficult 

could L? hi see any problem that d-ifinitely 

could not be overcome with proper planning and experimentatio?> . 
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lono fi^i' cLc^ ^^'^^ ^^^^ S'^SS ^"St be coordinated 

as long as .he SASS occurs in the same years as NAEP. Althouqh the 

J^flflr'^K^' "'^^ '° different schools (as ' 

1988), they cannot avoid using many of the saire school districts 
The spectre of two different organizations recuesting cooperation 
from the same <,chool districts in an uncoordinated way would almost 
bot^ surveis '° "^^^""^^ ^° cooperate and thus the diminution of 

^^^r^.r. then, is whether to minimize or maximize the 

overlap of the samples. Minimizing the overlap spreads a lighter 
burden over more schools; maximizing places a heavier burden on 
fewer schools. Merging the two data bases implies maximizing the 
overlap so that as much information as possible would be in the 

Zllttt r ri,t ^aa'a ^^'^ "° °^ knowing at this time 

^^fi^^ ^^ °" selected school and teachers would 

affect the cooperation rates. 

uQo^^''^°°^^ ^lready wary of the intrusion of NAEP. During 
the 1986 asssessment, we experienced more difficulty than in any 
nlr^i.'?o^?"'M ^" gaininn the cooperation of the schools to 

par*.icipate. More and more schools are feeling the burden of a 
K,fl^n^*^^ w and research programs and becoming dissatisfied 

N^EP is having to exert tremendous pressure and commit to e.cpensive 
services in order to maintain our traditional rssponse ratesT 

c^cc * '^t should also note some differences between the NAEP and 

f^^'.^^^i^f- ^^^^ samples fourth, eighth, and twelfth 
grade students and 9-. 13-. and 17-year-olds. SASS s.^mples teachers 
t:A grades and thus teachers in schools where NAEF does not; 

fnf* °' randomly selected students. SASS 

IS intended to make statements like. "11% of fourth grade 
teachers..." whereas NAEP is inten-^ed to make statements like. "11% 
!Lo?" graders have teachers who..." While the-e differences in 
sample properties can presumably be worked out, it may be that 
overlapping teachsr samples would be drawn for the surveys. The 
details of the sampling must be satisfactorily worked out before a 
merger can responsibly proceed. 

iQQn\",?^5*! factor affecting a merger is that the details of 
the 1990 NAEP have not yet been determined, and the pilot study of 
state-by-state assessment for that year has not yet been funded. 
tS^; °ffTT; ^l!* details of full state-by-state assessment in 
Zl-!- funded, have not been planned or approved. Present 
thinking is that NAEP will assess, state-by-state, twelfth grade 
students in mathematics in 1990, if funded. In 1992, NAEP hopes to 

k"'.-^^^*"^^"*^^^*' subjects in each of three age/grade 

combinations, if funding is available. 

If state-by-state funding i;, not available for NAEP, the 
overlap of samples will probably be slight, and it is doubtful that 
merging the surveys will have any benefit for either 
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On the other hand, when NAEP is fully funded for state-hv 
state assessments at ali of ^^c =.^.,/^.=>h i ^o'- scate-by- 

' field ooerationQ ic\.f ? age/grade levels, coordination m 

data bases mav L h ^^^^^V "^^•<^ssary and the opportunity to merge 
the costs and ben.n^!^'^''^ '° Surveys. Only the question of 

- cS:tniii^Se g:e:[:^%^aVtjrSe^:??[i^^"^^"' 

r-^c.-. Therefore, it seems reasonaole to attempt to estimate the 

^ogistr^s'":^'";^ J° '° ' ^1 t^ sJudy tie 

^^ merger procedures. Instead of decidinq to merqe 

or not to nr.erg?, attempt merging in one or a few sta tes i n 1 990 i f 

,n;T^;.i?r- ^ differences in cooperation rates would be 
partxculacy mportant. After some practical experience an 
•xtensxve bridge study might be in order to alsSre t^at'the 
continuity of NAEP is not lost. 
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Dr. Leigh Burstcin 

Graduate School of Education 

University of California 

138D Moore Hall 

*0S Hilgard Avenue 

Los Angeles. CA 90024-1521 

Dear Leigh: 

In response to your request for further comments on the proposed merger 
of NAEP and SASS. here are mine. Tn a nutshell. I believe there are 
"^o^V^^oo" "^""^ strongly against a full-fledged merger of 

NAEP and SASS. and that make the consideration of any merger in 1990 
inadvisable. Considerations of hou/whether some elements of the tvo 
data collections might be usefully integrated should be examined 
carefully in the light of specific analytic benefits, respondent burden, 
data objectives, and periodicity of the data collections before a 
decision to seek merger after 1990 is made. Per your request. I am also 
including a very brief discussion of the components of SASS. 

There arc two distinct and separable rationales for the proposal to 
merge NAEP and SASS: (1) analytic benefits to be obtained by adding data 
about districts, schools, and teachers collected in SASS to data about 
schools, teachers, and students collected in NAEP; (2) efficiency of 
v!« "Vrc'°" obtained by using the same samples for 

NAEP ar.d SASS. A third consideration is the practical feasibility of 

* collection that is in the process of substantial change 

(NAEP) with one that is as yet untested (SASS). These are considered 

b' 3U. 

Analytic Benefits 

I do not see major analytic benefits to be derived t'om merging the NAEP 
and SASS samples and instrumentation wholesale. Firsc. much of the data 
collected in SASS is designed to support analyses of teacher supply and 
demand and to provide estimates of school and teacher characteristics 
for the overall population of schools and teachers. In a 
cross-sectional sample, these data will not prove highlv useful for 
modelling school effects on student outcomes. 

Second, those characteristics of schools and teachers that provide 
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ttr^.^r 5 / expediences of students tested by NAEP are largely 
alreadx ,nc],.^ed a„ the NAEP background dato collected from school 
rii?^'r VI ^'^^ "-"^h^s ^''O teoch the tested students. (Note 
that the aifforent goals of the data collections require entirely 
different sampling of teachers. Whereas SASS seeks to describe the 
popu.ation of all teachers. NAEP seeks to describe the characteristics 
and practices of teachers who teach students assessed in a g.ven year 
eg the English teachers of those students assessed in the Reading and 
Vritins Assessment ir a given year. Thus, the teacher samples cannot be 
meaningfully mergec .) Where there are particular gaps in thcSfTp 
oftelcher f " « " ^"^"f ^ ^^"^ information about the qualifications 
of teachers) some modification of NAEP instruments would be sufficient 
to allow analyses of say. the qualifications of teachers who serve 
students of different types. 

Efficiency of Merged Samples 

Given proposals to expand NAEP to state sampling and plans to do so in 
SASS there IS the obvious question as to whether merging the samples 
uou.d provide less overall respondent burden for the data collections 
and result in lower costs for data collection. There are three 
questions here that need to be evaluated: 

.chinli'" ntrating respondent burd-n on fewer total districts and 

schools, reduce overall burden? Will it reduce respondent participation 
or response rates? Reduction of overall burden would require 
stream ining the data collection instruments for the two studies. Given 
tnc relatively ow degree of overlap between them, this would I believe 
result in very little reduction of overall burden, unless some data 
Clements and survey goals are eliminated from one or the other study 
This will require hard choices about objectives for either NAEP o SASS 
that can be given up. Concentrating respondent burden could lead to 
lower participation rates, as Joe Turner of Dade County suggested at our 
meeting. Given the increasing reluctance of states and districts cO 
cooperate in federal data collection efforts, this should L- an 
important concern given careful examinat. i. 

2. Will merging samples save administrative costs' This is an 
empirical question about which I believe there is little consensus at 
the moment^ Contractor cosrs for contacting districts and schools would 
obMoisly decrease if the same contractor administered both collections 
in an overall smaller sample of districts than would be obtained in 
independent administration. On the other hand, the costs of securing 
cooperation for a much larger scale activity and managing the 
complexities of drawing separate samples of teachers (and perhaps in 
soma cases, schools and districts as well, to satisfy the different 
analytic goals and estimation objectives of the two collections which 
produce different sa«f .ing considerations) will offset the above savings 
to some unknown extent. 

3. To what extent wilJ the analytic goals of SASS and NAEP be met 
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viLh the same sampling bi,«ci f ications' As mentioned above SASS 
rcqinrcs representative samples of districts, schools, and'teachers to 

cor "L*; ZcTl °' Characteristics and practices overall a . or 

certain specified strata (e.g. districts by size, urbanicitv. 

ZlllLa ro K representative samples of students, usually 

.elected to be highly clustered in a much smaller sample of schools 
ihc"HL?''n'" characteristics are not the ma^.r focus of 

the data collection), with oversampling of schools by ethnicity and 
other characteristics of students served in order to support estimates 
of student achievement for particular subpopulations . Though it is not 
technically impossible to design samples that serve both gofls or to 
the^^ ^""l^ing data to serve the purposes of different analyses, 

the trade-offs or inefficiencies in sampling require examination before 
the cost savings of merged samples can be assessed. 

Practical Feasibiiity 

A major consideration in the decision as to whether some merger is 

t^QO °^ "'^"'^ collections will bo m 

1990 Proposals to revise SAEP. currently being considered in Congress 
Tl rt sampling and possible local add-on., changes* 

in both the nature and frequency of assessment in various subjec. areas 
and changes in the governance structure of NAEP. Other proposals rhat ' 
have been raised by the Alexander-James Commission and the National 
oTT^l °^ be further pursued by the new governing body 

of NAEP. These include making NAEP a longitudinal rather than 
cross-scctional assessment, expanding the (undefined) policy 
analytic capacity of NAEP, e.stending its capacitv to support a.alyse. of 
school effects, changing the scaling and reporting features of the 

!nrhr"iH' f""* r'^!"- "^'^^ y""- substantial changes 

will be «ade to the design and conduct of NAEP which will totally alter 
the nature of the data collection activities and will reframe the 
questions about the desirability c.r feasibility of merger with any other 
data collection system. Plans to merge NAEP with SASS will be shooting 
at a moving target. 

At the same time, the first fielding of SASS in 1988 will produce 
substantial information about changes required in the management of that 
equally mammoth data collection activity. However, analyses of the 
initial exper-jnces with SASS will not be available until at least 1989 
past the point when planning for a 1990 merger would have to have been * 
well underway. Indeed, a very important goal for the Center is 
establishing the periodicity of major data collections in such a way 
that past efforts can inform the subsequent data collections, that time 
for adequate field testing and analysis of field test results is 
permitted, and that energies can be devoted to data analysis as well as 
data collection. Finally. SASS has a number of different components 
which though currently joined, may not need to be maintained in tandem 
in future da'.a collections. Thus, many different options are available 
for achieving data collection goals short of either full merger, on the 
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one iiand. or s imu] tanoous ii.dcpondc:U fioldings of NAEP and SASS every 
two years, on the oihcr. 

Components of SASS 

SASS currently includes surveys of school district administrators 
and public and privau- school principals and teachers in 1 inkea samples 
of districts and schools. A follou-up survey of teachers in the year 
aftei the baseline survey is also planned to track teacher mobility and 
attrition and to compare leavers to stayers. The data sec is designed 
to support analyses of teacher supply and demand (data elements for 
these analyses arc lodged in each of the district, school, and teacher 
surv/eys); and to describe school programs and services, teacher and 
administrator characteristics and working conditions, and school 
staffing patterns for different states, sectors, and levels of 
school ing. 

Fielding t^^e surveys with all of these respondent groups .,na data 
Clements joined is a useful strategy in the first year of implementation 

because it permits continuing time-series for some data elements 
(e.g. counts of teachers by field, and teacher demand and shortage 
estimates) while launching some new time-series fo data that are 
much- needed but have not been collected by CES in the recent past (e.g 
estimates of teacher turnover, haracter istics of the teaching force). 
In addition, some multi-level analyses are made possible by the linked 
samples of districts, schools, and teachers. However, the surveys may 
not need to be conducted in precisely the same form or packaged 
precisely in this way each time. 

There are many possibilities for decoupling elements of SASS depending 
on how often certain kinds of data are needed and whether all of the 
data elements arc necessary for state level analvses on a regular basis. 
For example, the Center has already considered using the district survey 
to collect data on teac er demand and shortage on an alternating basis 
with data on district f* ance and expenditUi es . Data on teacher 
attrition rates, motility, and sources of supply can be collected from a 
few Items in the school survey if they are needed on a more frequent 
basis than other data elements. (Given the burden and costs associated 
u-ith the full SASS data collection, RAMD had recommended this strategy 
as an option in designing the survey.) State estimates may r.o. b« 
needed for every state in each cycle; samples could be drawn to provide 
national and regional estimates regularly and state estimates for a 
rotating third (or some other fraction) of the states during each data 
collection. Data on school programs and services may not be needed with 
the same periodicity as data on teacher characteristics. And so on. 

In my opinion, a full fielding of SASS on an every two-year cycle is 
probably not needed and may push the limits of the Center's capacity. 
Such a cycle allows almost no time for refinement of the survey design 
based on analysis of the prior cycle's data and data collection 
experience, and virtually eliminates the possibility of field testing 
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any proposed changes. Gnon thai most of ihc informal .on provided bv 
5A55 lias not been collecied or reported for manv decades, a 3--year ' 
cycle may prove sufficiently umely. Alternaiivelv. siaggering the 
number of states for wind, representative .amplos will be drawn or ^he 
data elements that will be included in each 2-vear cycle could also 
reduce overall costs and respondent burden The point is that when 
considering the costs and benefits of data collection strategies or the 
possibilities for merging somo aspects of different data collections it 
IS useful to consider a variety of options for meeting various data 
collection and reporting goals, rather than thinking of SASS (or NAEP 
for that matter) as a single giant blob. 

A Note on NAEP 

i believe that some of the rationale for merger on analytic grounds 
derives from lack of familiarity and use of the full NAEP data set 
including its school and teacher survey components. There is also'a 
fair amount of variability in the content of the data set from one 
assessment to the next. Each panel has its ou-n views on what is 
important to measure. Given the changes in each assessment in the 
nature of the teacher samples and the types of background questions 
asked of school staff as well as students (and the changes in item 
sa..,pling strategies that have influenced what kinds of analyses can be 
performed), it may not be surprising that the analytic potential of NAEP 
has not yet been fully exploited. It may well be worth undertaking a 
systematic exploration of what key analyses are desired from NAEP (or 
from a NAEP/SASS merger) to ascertain the degree to which -- and the 

^Vl.l'l ''*^''^'' "" '^•'^* accommodated within the current structure 

of NAEP o» a regulaiized basis. 

1 hope this is helpful to yours and the Center's efforts. Leigh. I 
thought the meeting was very useful. 

Sincerely . 

Linda Oarling-Mammond 
Director, 

Education & Human Resources Program 
LDll:nr 
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FR Edward Haerlel '<J ^ 
RE Reflections on the desirebilily of e NAEP/SASS merger 

From the discussion el CES on Friday November 20, il seems cieer Ihel 6 
NAEP/SASS merger al this lime is ill-advised. Nonetheless, some of the ideas 
aired might lead to improvements in both NAEP and SASS 

What IS meant bu a NAEP/SASS mer oer? I take merging NAEP and SASS to 
mean that in 1990 or in 1992, the NAEP sample would be defined with schools 
rather than counties or county clusters as PSUs, and the same set of schools 
would then be asked to respond to the NAEP questionnaires ts the SASS 
questionnaires, probably at about the same time. 

Details of the sampling of respondents within schools under such a 
scheme ere unclear. Presently, NAEP draws a sample of students end then 
administers a questionnaire to the teachers of those students sampled. In high 
schools, only teachers in particular subject areas ere included, depending on 
the content eree of the essessment. SASS drews e semple of all teachers in 
the school. 

Threats to Continuit u of NAEP Trend Da te 

Merging NAEP and SASS coulo jeopardize Ihe continuity of fiAEP trend 
data in two weys. by compromising school or teecher cooperation due to a 
more concentreted respondent ourden end by eltering the cherecleristics of 
the NAEP sample to eccomodete SASS. Meinteining the NAEP trends must 
remein e peremounl concern. 

Concentretinq respondent burden . In order to realize most of the 
potential benefits of e merger, it would be necessery to link SASS teecher 
survey responses to NAEP student dete et leest et the level of the clessroom— 
linkege only et the school level would be much less useful. Thus, some 
coordinetion of NAEP end SASS sempling within schools would be required. 
This would concentrete the burden of responding on the teechers of NAEP 
student respondents, possibly leeding to poorer teecher complience. The 
increese in tolel person hours required for dete collection in e sempled school 
could elso Jeoperdize NAEP s exceptione) school participation rate. 
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tm^9MiniMLmi^jliJm9n Further changes .n th. NAEP ^arr.Dl. 

occornodate m spiraling and those already planned to enable^s ate-levP| 
compansons could also jeopardize the contmuitq of NAEP trend data n.l 
magn.tudes of b.ases that m.ght be .ntroduced b such de^ ch^^^^ ' 
dnncult to est.n.te, but even small perturbat.ons could d'Lpl tTndr 

Possible JuMif iCfttinn. fnr ^ UAlP/9.AR<^ M.r ^a. 

Despite thes^ risks to the integrity of the NAEP program several 

hlln. f "S «P""ic questions could be answered through such e 
linkage seem to end in one of two pieces Most such questions coulZ 
answered through some modest redesign of the fMEP t.acher end pr nc,ool 
questionnaires. Presently, these questionnaires ore driven Og the s ' 
more " ""'^ ^"^"^ conceived more broad , and 

oaoressed through small NAEP 
?hfirr . ''"'''""^ e<l"i»« HOW large are 

me e„r^ r r'"^ """'i'"'' resources provide » 

different groups of learners? (E.g., children at risk, children in large urban 

Si /"'T'- " •'«' l -guistic gro p? th 

Co tZ s;,°: Tr """^ 'c-'evement') ThiV is an 

Tough a NA P/sltf '""^^ '"'l^fly '^oressed 

the students laufh. fn T . """''""^ '"^ characteristics of 
wou?a K. 1' *K """" """""""S Of total educational resources 

St a merged NAEP/SASS data collection. Adequotelu addressing these concerns 

rhr::o°h ° Tn'.T'^ "'-'^ ^"^^^ " ""9-' eccompMshe 

Ih ough an intensive sample surveg along the lines of Hall, et al s proposal 
but IS not a reasonable objective for NAEP and SASS P™Posai, 
fla:9 «r wgyld enhfince the i«>f„in.» sASS h,, nr,,.i.in. 

no question but that the usefulness of SASS as part oTtlSrfndLator 
o^"t "« ""^'fe .0 some bro. °st d 

more Mtr^ . f " SA3S database 

restllr « K T""""^ ""^'"'"S academic research 

questions. Numbers from SASS could tel I much more about the health of the 



educ ^,lK.n ^.y^te,,, ii it. rouid reveal the eo.ico.ic.-.ol coiHeouenrHC. r.i diMetvct 
level<, cf sieii qualuicaiions or resource ellocaiions, Thai bHng said il dr.e-:. 
not lollow that NAEP is a good source of the needed outcome intorma»i'on Ever, 
Uiough NAEP IS now moving m the direction of providing summalive 
echievem'..t measures fir indwiriuel students, its pnm'ary purpose remains tr. 
survey trends .n aggregate performance on relatively narrow Lurrirulum 
elements, and t..al is what it is designed lo do best ' I don l have a t.pltnr 
solution I am pessimistic ebojt attempts to link or equate data from 
independent, ongoing a-.e sments using different tesis, end el the same timp 
I err^ re.uctent to 'ncree-.e the testing burden on students One pcssib:liti 
esptruelly in lerger sta.es, would be to link SA55 dete to stete essessment 
dele. Celifornie s CAP test for example, provides solid dete on ten percent o. 
the student<^ .n the notion if 5ASS instruments in Celifornia schools could he 
iT.ked to CA.- .esults, th*; usefulness of SA5S could be increased wu.oul 
jeopardizing NAEP dete. 

nerger would seve dete collection cncf c end reduce tntBi rp<;p nnripnt 
bur:den It eppean . from Friday's discussions thet there is not enough 
information availeble to estimate the moonilude of possible cost sevings 
Fui-ther study of this question would be helpful, but it is unlikely that sevinqs 
would be sufficiently lerge to outwe.gh the risks of e merger to continuity of 
NAEP trends. It beers repeating that totel respondent burden has less to do 
with respondent cooperation or with dete quality then does the amount of time 
end effort required of individual responoents. 
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TO: UighBurstcin December 2. 1987 

FROM: .Morris H. Hansen. Wcstat 

SUBJECT: CRESST NAEF-SASS Merger Discussions 

, o-'i'^ j^Ji^"^^ ^y^' summary of the meetings on I Dvember 18 and 20 anJ 
also Rjchard Mumame- comments. I feel that I have little to add that hasn't all been 
covered in these two documents. They generally present a similar point of viev. v^ith 
which I am in general agreement, subject to the following additional comments. 

(1) Your point 7 in the summiiy states that serving national interests is an 
insi.fficient incentive to school (or district) paricipation. in the light of 
competing dau collection burdens. This seems to suggest that specific 
feedback of mdividual school summaries into the schools or school districts 
from studies such as NAEP might be necessaiy to obtain cooperation. I 
believe that school benefits (and incentives) can be demonstrated through 
more general means, if the programs can be reasonably shown to be 
effective in guiding improvements in .^te and fcder.'.I programs, curricula 
etc., that of course benefit the schools. Effective cooperation witli NAEP 
has beer, obtained in the past, without such specific feedback. I ' clieve 
more extensive general uses and applications of NAEP and other 
wonhwhile programs that are positive can be presented in a way to obtain 
cooperation, and should not be undersold. Other impor, -t national 
sutisiiCiU programs in education and in other subjea areas survive and have 
achieved effective cooperation without such -pcclfic feedback. Making 
cooperation depend on such feedback may lose the cooperation of schools 
that do not see an explicit benefit from the feedback. 

(2) At least for the nca: future I believe it desirable to emphasize, as you have 
suggested, the d:sirability of fielding NAEP and S ASS in different yeans (at 
least if NAEl is extended to a sample by sta'rs). 

Again, your summary is not onJy an excellent summary of what was discussed but 
presents a point of view with which I generally agree, 
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Comments on Merging SASS and NAtP 

Richard M. Jaeger 
Unfversry of North Carolina at Greensboro 
6 December 1987 

I agree with the developing cc sensus ihat NAEP and SAS: not be 
merged ip 1990 In making this recommendation, I am defining merger as 

1 Rede!"ining the national NAEP sampling plan so that sampled 
teachers become a SL-t)set oi those sampled for SASS, 

2 Use of an expanded questionnaire for sampled teachers that 
incorporates virtually all questions presently used or planned for 
NAEP and all questions planned for SASS; 

3 Use of Identifiers that allow linking of stuoent records teachei 
rec^rdc, school records, and school district records, 

and 

4 Ensurina that reasonably precise estimates of relational statistics 
can be formed for at least nationally representative samples of 
students in the NAEP-sampled grades and their teachers, students in 
the NAEP-sampled grcdes and t'leir schools, teachers and their 
schools, teachers and their school districts, and schools and their 
districts. 

I find the questions raised at the meetings on November I8th and 20th 
sufficiently compelling to convince me that the risks resulting from a 1990 
merger of NAEP and SASS outweigh the potential benefits. In particular the 
risk of jeopardizing the NAEP time senes is substantial, and the nation can 
111 afford the disruption of that time series since NAEP currently provides 
the only trustworthy, nationally representative, longitudinal data on student 
achievement and academic progress. In addition, the potential benefits of a 
merger of NAEP and SASS. although discussed in the abstract In various CS 
documents, do not appear to be well articulated. And in the abstract the 
Cdse is not convincing. 

The position advanced in the paper entitled Al:£^natives fnr a 
Nat lonal Data System on Elementary and Secondary Education (Hal 1 
Jaeger.Kearney & Wiley, 1985), that CS should develop an IntegratecJ 
national data system, r?lher than a series of unartlculated surveys should 



in nv/ view. c,u;cie ir.- Icog-te; n. redesign of [hf •;. :„ og: foi- 
collection of information coricernirig ecjcarion y\c fchoohng However, 
movement toward that goal should De gradual, based on a clearly articulated 
plan for analyzing and reporting resulting data, and based on a substantial 
body c f research .opcerning the likely benefits and conse'^ jences of such 
movement 

Assuming postponement of a NAEi^-SASS merger to 1992 or beyond, 
the intervening years should be devoted to the types of research necessary 
to more clearly guide a decision at that time Much of the judgment 
concerning the possibility of merger m 1990 is based on speculation and 
essential raution, m the absence of clearly applicable information in 
particular, resources should be devoted to 

1. Study of the effects of seekmg information presently collected 
from teachers In NAEP and planned for SA5S, on teachers' willingness 
and ability to provide such information. A carefully planned study 
could provide essential information on relationships between 
questionnaire length and content and teachers' response rates to the 
overall questionnaire.various types of questions, and various 
questions Information that relates questionnaire length *o data 
quality must also be obtained 

2. Study of the feasibility and costs of providing data-collection 
conditions for teachers that enhance response rates and the quality of 
data they provide, Including alternatives to mailed questionnaires, 
payments to schools that would allow hiring of substitute teachers 
during data collection, and direc^ payments'to teachers who provide 
data. 

3. Detailed specification of the purposes to be served by a merger uf SASS 
and NAEP, Including a listing of the research questions to be addressed; the 
data series to be established or maintained; articulation of questionnaire 
Items, data serie;,, and research questions; and articulation of questionnaire 
items, rv'search questions, and analytic procedures to be applied 

4 Beginning in 1990 at the latest, common record Identification 
numbers should be used In NAEP and SASS, so that some data (however 
limited) from these surveys can be linked Although such linking 
could not be expected to >)rovide trustworthy national statistics, it 
would facilitate exploratory analyses thai would 1'' ne the 
potential benefits of a formal merger of the two survv . Record 
identification shouM allow both within-survey (vertical) and 



Detweer.-survty (nonzor.l.^l; :;r;l „-,g o^^ Caij in adCilior. ic. -.uoofmc 
t^^purv-.v. , a...-.yL,c Siuuit^^, ^ucn record ioentiucaT .on wot'l^ ^.uppor* 
estimation of the oegree of respondent overlap and Durden that" 
results when both naEP and SASS are conducied during the same year 
and the comparative cornplet. -ss and quality of data provided by ' 
teachers, schools and districts ti.^t are faced with one survey or Doth 

It IS possible that school principals and superintendents (if not 
teachers) would ag-ee to provide a substantial amount of data durinq a qiven 
academic year, provided they were assured that no federally-initiated data 
collection would take place within their schools (or school districts) in off 
years Studies of the willingness of potential respondents to assume more 
inte.isive, but more widely spaced, periodic burden should be undertaken as 
Should studies of the potential advantages of using rotating panel designs 
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Or. Leigh Burstein 

Center for the Study of Evaluation 

UCIA Graduate School of Education 

40'i Hilgard Avenue 

Los Angeles, California 90024-1521 

Dear Leigh: 

l'h^^^''''^*°^^^ ^^eting concernirg the possible n»erqer of NAEP and SAS^ 
J^Pn ^"^^^ ^^^^^ Linda Darling-Haownond, We need to 

or!nit77"t 7 ""^V" ^^^^ Ouestionnari^s are not blocks Jf 

granite but layers of pebbles that can be used when appropriate Oust 
because the questionnaires presently exist as^nit doesn't ^an that thel 
shou d remain so. It seems illogical to even 'maintain a dialogue about tSe 
eventual merging of NAEP and SASS as they presently exist. YourTetinq 

Se^^X2he"^/r„^d\ror:r^' '''''' ^^-'^ 

JL**'^H^°"\i'""*''°"'r r*''^ ^'^^^y ^° ^^"^ « t«o year period. 

^.K^^"^ ^^^5 ^"^^'■^ P^^ject to succeed, the response 

CES needs in ^1 °I 'f°°' ^ '"^'^^^"^ on\he paTof 

-miirf Itn. - ^ ^^•^ ' reasonabTF level. Therefore, your concept of 
.till J^l °" U"^0'-t""ately. in my conversation with son^ cis 

staff who have presented these forms before CEIS. there have been instar.ces 

^SffJ™ ?ol%;""^'°:'.r'"^ "''-^ Our job is to put 

^^o^r^icT clearance process, and collect the data. 

Someofie else will do the anclysis. Although this is a paraphras.^ I believe 

asked »^^•ch' itPmc^T!!^- rV"'- ^"^'■5°" -ast meeting simply 

asked »/hich items on which forms go with NAEP and which shou'd oe separate 

Jprti^^'i '^°i"^^ ^' collected. It is not an - sy task buJ 

rVr^iln l "^"T'^J.^: ^'«' sure that a group that would inc.Je you. LinSa 

w?th : '''"^ Planchon could have a good product 

witmn I snort t'lipe. 

The first task would be to lay out the analytical framework of what 

Sp hI^p"- }? ^° "'"'^^ assessment data. Of those which need to 

be done^.-ennia ly and which less frequently. In my chart I list the former 
wiin a ♦ end the latter w^th "•m-**. 
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Only when NAEP + and NAEP ++ has been decided should SASS be considered. 
Even the core of SASS shouldn't be collected sooner than every three years! 
If the questions and analysis procedures remain consistent, it seems 
unlikely that more frequent monitoring will be useful. This p- ?ss to do it 
as often as possible is a legacy of the Congress and other publics receiving 
too many conflicting answers to the same question. Once the data base and 
the analysis process is credible, educators can spend more time reviewing 
successes and providing solutions to problems than dreaming up new ways to 
ask questions. 

Assuming that the relevant school, principal, teacher and pupn achieverrent 
data via biennial NAE (mild merger) is in place and that SASS is in place 
on a triennial basis, the question remains are there any reasons wny these 
two data collections should ever occur during the same year. Perhaps there 
is a joint state or national profile that makes sense. I don't know, but 
there is time to investigate that possibility using this timeline. 

SASS NAEP 



1988 X X 

89 

90 X+ 
51 X 

92 X++ 
93 

9« X x+ 

95 

96 X++ 

97 X 

98 X+ 
99 

2000 X X++ 
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The first NAEP would be the mild, perhaps qentle, merger. SASS waits 
until 1991 *i^en CES and others would have had an opportunity to carefully 
review the data market the results. !f a more expansive approach can be 
justified with NAL'P one can wait until 1992. It would not be until 199^ 
and every six ytars thereafter that they would occur during the same year ~ 
the case still has to be made for the utility of doing that or perhaps the 
optional solution is to wait until 1991 and then move SASS to a fctr year 
cycle. ^ 

I cou^d ramble on for a few more paragraphs, but I would iimply start to 
repeav your comments and Oic5c Murmane's because they are so appropriate. 

Thanks for the opportunity to comment. 




Thoma^ 
Manager 

Student Assessment Section 
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Decen±)er 3, 1S87 



Leigh Bux.itein 

Center for the Study of Evaluation 
Graduate School of Education 
University of California 
405 Kilgard Avenue 
Lo3 Angeles, CA 90024 

Dear L^igh/ 

These rough notes are a response to your materials sujwnarizing the 
NAEP/SASS merger meetings. I hope it arrives in time, and I apologize 
for its lateness and roughness. As you know, other events made it 
impossible for me to attend to this until a few days ago. 

I strongly agree with your recommendation #1 that the NAEP and SASS not 
be mergeed at this time. I think, wever, that the recommendations to 
the Advisory Council shculd -larify the reasons that various 
participants gave for avoiding the merger in more detail than is 
provided in your memo of 11/24. In particular: 

o Valid analyses of school effects simply cannot be 
obtained from a national cross-sectional survey. 

o Nonet*- .^^^s, a merged NAEP/SASS would inevitably 
brin out a torrent of invalid but potentially 
influential studies of school effects that could 
seriously distort policy. 

o Merger wou. i seriously threaten the integrity of 

the NA£P as an indicator--that is, a descriptive study — 
of achievement. One reason is the risk of increased 
non-participation because of che increase in 
individual-level burden » merg'^r would cause. 

With respect to the first of these points, the limits and appropriate 
uses of cross-sectional data in gener**, and of nationally 
representative cress-sectional surveys in particular, need to be 
articulated more carefully before modifications are made to either naep 
or SASS, even if full merger is ruled cut. There was pleasantly little 
opposition in the aicetings to the strong position that Tony bryk. Bill 
Schmidt, and I took about •ihe li Us of cross-sectional data—that ii». 
that causal modelling of school effects :ls an entirely invalid use of 
such da-^a. Nonetheless, the discussion of what uses are and are not 
appropriate was a bit unfocused, with a lot of alternative dichotomies 
(descriptive versus reUtional analysis, policy analytic versus schuol- 
effects research, etc.) being used without sufficient clarification. I 
Would suggest th« following elaboration. 
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IZHLT r. more specifically, the testing of causal 

hypotheses-concerning t.e determinants of achievement simply a„ 
appropriate use of cross-sectional survey data. Such dat7cL be used 
othrr^r 'rr' '^-^-^ ^^^ose hypotheses reo^Irea 

causal hypotheses does not mean that cross-sectional surveys are 
inJ™^"'- -"""^ '^^ ^" extremely valuable sourc- of descriptive 
a"f t^ "it T 1°' " " invaluable con^onent of o!r 

all too .imited system of indicators of student achievement. 

I'^Jj'l "''^?"!*^ meeting, you responded to this point by noting 
ITA't -^o-^ «hat is meant by^de.criptive . - The 

term j.s often used disparaainalv for ov>fnr^1. J- 
"only desrrip.iv- - mo-o- -- - ^ • ^ !' studies are called 

y ccarrip.iv.. Mo.Cv..^,, ...,ete is a widespread view chat 

crc«'rr* 'h"**'" " technically simple, con^rising biv.riate 

cross-tabs and the like, m fact, neither vie- is warranted 
Descriptive studies are simply those that attempt to figure ouc what a 

o They shape further inquiry, by generating ••/poth^^ses 
and guiding other forms of research (such as smaller 
longitudinal studies designed to assess causal hypotheses); and 

o They can provide valuable information for policy fonnation. 

rHnr'; '^""'P^^"* be technically con^lex. There is no 

reason, for example, why descriptive studies need to be only bi- or 
L^rtni*"' multivariate studies that purport to be 

testing causal hypotheses are actually valuable because of the 
ae.'criptive information they provide. 

IrL!H*rJ^*',*'*^*^''^ recently published a study in ER that 

;o?^!n« c" . <^i3tricts produce low levels of achievement, 

JLrlnJir l>«r-pupll expenditures, and that 

di«r?« significantly associated with achievement when 

tnillzll "All ; I? controlled. The data wer. cross-sectional 

coin™ ot^lLf°%f -"""y- ^" «"<*y "nnot 

the J^f^^ ^i«onfin« the hypothesis that district site somehow causes 

f °' revenues, although it certainly makes that 

hypothesis more attractive. Nonetheless, it is valuable as a 
r^t!rT' I!!* ^^""^P'^i^* «udy, for it Shows that certain important 

soi^ iZ condi^ned on 

3onve liTvportant confounded vari«b}t>3. 
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If you accept this viewpoint, the purpose of relatj->nal analyses of 
databases such as the NAEP is to provide what could oe called 
-conditional descriptive" information. For example, it is valuable (for 
policy as well as to guide other types of research) to explore the 
distribution of achievement, conditioned on ethnicity, region, type of 
school, and so on. Th«;se conditional descriptive analyses can of course 
be multivariate, subject to the limitations imposed by sample size and 
design and characteristics of the variables. 

This then leaves us at the point where both the Wednesday and tne Friday 
meetings came to a nearly dead end: what, precisely, are the 
conditional analyses of student achievement or school and teacher 
characteristics that wc need both for policy and to guide other 
research? it is easy to co^e up with examples for the NAEP— that is, 
instances in which we n^od assessments of achievement conditioned on' 
school, community, and otuer variables. Ethnicity is a good example. 
In addition to bivariate tabs and trend analyses conditioned on 
ethnicity (eg,, are blacks continuing to gain on whites?), it is 
important to consider a variety of trivariate relatior.:;Jiips . For 
exan^le, have the relative gains of black students been greater in h^gh- 
minority or low-minority schools? In certain regions? 

Examples where it would be productive to condition SASS analyses on 
achievement are less obvious (and probably far less numerous), but they 
< ast. For example, it we want to track the flow of teachers with 
different characteristics into various types of classrooms within 
schools or schools within districts, the level of achievement of the 
students they are assigned is an obvious variable to include. 

I think that the required next jitap i^ to rethink systetna^ ic^lly what 
conditional <l<5scriptive analy .j are iitiportant for both of the two 
purposes noted above, and to con^are the results of that effort to the 
current v*riable lists for both the SASS and the NAEP. I think that the 
NAEP end of this should be relatively straightforward and might lead to 
the conclusion that the non-outcome variable set needs modification, 
perhaps by adding or substifutlng SAGS items. The SASS end will prove 
f*- more difficult, for incorporating meaningful achievement measures 
into the SASS would be incomparably more difficult and more expensive 
than incorporating SASS items in the NAEP background variable set. 

Give me a call if you would liice to talk these issues over further. 



Sincerely, 




Daniel Koretz 
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November lO, 1987 



Richard j. Humane 

Harvard Univers:ty Graduate School of Education 
Cambridge, Ma. )2l38 

Should NAEP and SASS Be Mergea? 

T that NAEP and SASS not be meraed at this time 

Lnlfifl recon>n.endation on an assessment of'the probable 

ou? fil rf '""^ T'-^^' ^"'^ possible costs. ThL memo sets 

out the reasons for my recommendation. 

.^"^^^ provides the most important information on th«- 

cognitive skill levels of American school children 

Consequently any change in design that threatens the ability of 

cSnsid!r^^°'''^ unbiased information on achievement must be^ 

benefits ar^Mc; T ''"'^ °' '^""^^ '"^^ potential 

form ; ^" potential for damaae in the 

n^rt.^ , compliance by school personnel 

particularly teachers, is significant. It took teachers ^^00?* ' 

r/e-Tsl "^TTeiall '^'^'^^ questionnairrS:r!na'?h: 

pre test, if teachers are expected to complete this 

abou^J^^Ji^^ care-ully. and provide other information for NAEP 
about teaching techniques, tMs burden may simply be too great 
fnlr ^f-"'*'""' °* te.chers. Moreover/it is likely 

toli! teachers who do not provide complete cooperation wJi 
be teachers with particular characteristics, or teachers who work 
in particular tyr.es of school settings. Thus, such 
l^o^rS^'^"*"®' ^^^^ complete cooperation, could 

va??f fn*! ^^"P^^ design, and make it impossible to make 
valid inferences about the nation fro., the sample. 

Of iti"oiI;^'' consideration is that NAEP is undergoing a chanoe 
introdur^2 I" T^"' 1° ^tat e-by-state comparisons. This chanae 
dr^u?n« ? / """^^'^ °* ^^^"^^ concerning sample design and' 

drawing inferences about the population from the samples! It 

N^irarrifslL^ti;:.^"^"^^ ^° «a3or^hanges in 

Miro "^^^ potential problems associated with meraina SASS with 
NAEP &re significant. In my assessment, the potential benefits 
are not commensurate with the potential problems. Let me 
^n^w??'' i^""** ^yP" °^ potential benefits in turn: increased 
1?«J^I power, reduced respondent burden, savinas on cost of 

aami nistrat ion . 

Increased analytical power? 

merging SASS and NAEP will not enhance greatly the extent to 




For such causal modelling, longi •:udinal data on students' 
J iionn^"* ^""^ needed, st.ch as are provided by HS£B. and NELS. 
?he SaSp =piraling used in administer 1 no 

the NAEP test items means that only a very few test items cculd 
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cnaracte. istics and rate of turnover nust be viewed a« , 
decisions teachers make about where they want o worJ anH ^ ^ ''^ 
decisions school districts make about wLn, tU wan^ 4 L , 
SASS may support research on the factori thIrL*? ^'^Ploy. 
and school listricts' decisions But i? It ^"^ teacher • 

models of this set o^ .1 ^^^"^ unlikely that 

Reduced Respondent Burden-? 
K.,^H '^^®^P'^i»a'^y question that CES should ask whether rh» 
burd-n Of participating in surveys win lead resoondlJt^ IT 

n a manner that jeopardizes the useful nesI orfSe surilv 
information. Actions that threaten the survey include 

a"ct"ionn?:-„:r"^ nkiinhr:?^:"^" r ^'---"--s. sue 

adninistfators U tJls f^" °" teachers and 

Reduced rn«:t pf Administration-* 

.aken''«"ou|w° "".c""" ^""•1' coat »ust be 

instrunenta will in »vJ»mT^1« *w rieiding of the 

concerns the dei;nd'anl'Ll^ry%'u:^?,2^„"r,ir"'?hirr""'; "J^^ 
and complex Instrument. It took resrond^n^!'- ^ lengthy 

complete this instrument dur no t^e Dre-tel? T^""^ '° 
fir.t administration has be^r^o^STeJ^J -d" che°d"l?a"h^"e^ilen 

-2- ,9^1 



large respondent burden. instruirsnt just ^•ies the 

unuJeTor t'So reaLns'?f -bout SASS . ,t se^.s 

J'iopardize the quality of fLnr^If ^^^^^i"? SASS could 
the achieve™ent'of\^J^ nit on°rJciool''r,^'^' 
seems as if it would be ea^^Jr to r^.o, Second, it 

arise m fieldina SASS if f^ese nrobf^^''^ '""^ Problems that may 
issues of integration with JIaIp/ ^ ^""^ complicated by 

Alteri^\"thTL'mple^%ifaJ'forSrip';rJ° ""T ""^^ ^^'^ 
comparisons is a signi f icant Lov 5 accommodate state-by-state 

the new Instruments'tia are pa?J 'of lAS^Gn ^ k''" 

Merging the two surveys th^« ! "^^^ ^ "aior task 

i»Ao5 Will be met successfully. 




UN'ITCD STA") LS DLPAKI MCNT OP COUCATtON 



M)K 11)1 • \H<)\\I Ul SI \|<U1 \M) l\ll'K()\ I Ml \ I 



December 2, 1997 



Leigh Burstein, Co-Director 
CRESST Quality Indicators Study Group 
Center for the study of Evaluation 
UCLA Graduate school of Education 
Los Angeles, CA 90024 



Dear Leigh: 



I appreciate being included in the NAEP/SASS merger meeting. While 
you did not specifically request feedback from in-house participants, 
my perceptions as a relatively naive newcomer may provide you with a 
perspective that you would not get otherwise. First I'll summarize 
what I heard. Then I'll summarize what I ^hink. 

What Heard 

There was general concensus that the separate, primary purposes of 
both NAEP and SASS are important. If (or however) a merger is 
implemented, the integrity of those separate purposes should not be 
compromised . 

A recurring set of concerns focused on tho relationships to be studied 
If a merger occurs. What relationships between and among variables 
wixl be examined and, more important, why? There also seemed to be 
as'"c>usar'^" that ary reported relationships might be mis interpretted 

ixeo*^®"^.^.®*^ °^ concerns, focused on the burden of data collection on 
HAEP and/or SASS participants. The greatest fear seemed to be a 
potentially negative correlation between burden (actual or perceived) 
and validity of the data. There was also sons concern regarding the 
^t^^K? schools for participation. Perti-ient information in readily 
useable form was suggested as meaningful reriumeration. 

A number of participants suggested using existing NAEP data to inform 
M^L concerning a NAEP/SASS merger. Specific studies using 

^AtP data could focus a series of research questions about American 
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schools that might, or might not, be ef f iciently/ef fectiveiv ;,ri^rno-.^ 
via some form of merger (e.g., strong, ^ . ^^^"'^■^y/^^ectively addressed 



used ii^tSr.J!Se or"^^^ questioned wnich SASS components should be 
alpecJs o? SAsI wnn /''^r: l^^'^ ^^"^^^1 agreement that not all 

addressed hv fho^^ """^S ^° included in answering the questions 
onfcf • merger; hence, the continuing question — what 

question./relationships will any merger specifically aSdress? 



What I Think 



NAEP was designed to provide national level data on student 
stafrcJar^;^.'''? i'^'^^^^ provide estima?2s of school and 



— — — w IllVt 

national, level decision making. 
puf^os^r^Jl'the iStent'o? "^^^ '^^^^ SASS 

?etSelrstuL^t^^^i^e^%^°t' ^nS^iL^^ ^oeror!r^^h::iri":^'^^ot 

™ent 'S^fens'iL'/'''' (st?ong,°;;ild"/or°:th;r;i an 

Shy? S^at rl auin hinf ^ho";; k"'^' relationships will be examined? 

.xLinrs^^^r^^i:u"^s^cl e%\-i^ned^^ :e?:ji:^shi^s^^^he^sL^5^ -^li:^' 
BJ^^r^^^ ------ ^^^J^'tl::^ 

exD^rLnr^^^/^"?"'"'^— that years o? teaching 
ten ml Shy 'in ?f'?' positively related d^el not 

or any otSe^ (Jet uns;.Lr.Sf ^'/"^why studying that relationship, 
concerned ehaeeL "^! ^'®*^' relationship is important. I am 
questions asted and'^h^J^rn"'— " ^^^SS) will drive the 

little meaniiafulfmn^oe jnswers to those questions may have 
xiccie meaningful impact on what happens to kids in schools. 

m^etiTS'aJt^nd^dln/jo/a?? "'^h -^ ^^^^ ^<=itly expressed in the 
y attended (11/20/87). Those issues concern: (a) the 



difficulty of uniquely attributing particular student achievements to 
particular sources (e.g., particular teachers) and (b) the fact that 
student achievement tests are designed to assess students' (limited) 
knowledge, not teaching or schooling effectiveness. I raise thes.^ 
issues because it makes intuitive sense that relationships between 
student achievement and other variables (e.g., teacher/school 
charactaristics) indicate that: (a) particular student outcomes may 
be attributed to particular types of teachers or schools when, in 
fact, the variance not accounted for may be more informative than the 
variance shared and (b) student achievement test scores provide 
acceptable indicators of the effects of teaching and schooling when, 
in fact, such scores are but proxy measures. 

I look forward to receiving your summary of the merger meetings! 



Sincerely, 




Doris Redfield, Ph.D. 
OERI-CRESST Liaison 



ERIC 




STATl Of bOUTH CAKOIJNA 

DEPARTMENT OF EDUCATION 

COLUMBIA 29201 



Chifhe C Williirm 
tiMt Sgpcrinitndcni of Education 



November 19, 1987 



Dr. Leigh Burstein, Co-Director 
CRESST 

Graduate School of Education 
University of California, Los Angeles 
405 Hilgard Avenue 
Los Angeles, California 9002A-1521 

Dear Leigh: 



A^^"" response to your request to attempt to capture the essence of 

sisS annitr ''•r'^^' concerning the proposed merger of tSe 

SASS and NAEP saiDples. First, I believe there was strong concensus, if not 
unanimity, that a complete merger of NAEF and SASS is not desirable. The 
major points, as I recall them, in support of that position are as follows. 

^" i?«""/"cl^c'^'"^ "^'^ °" '^he assumptions that 

NAEP and SASS should be conducted with the same frequency and 
concurrently. No compelling arguments have been advanced to support 
either of those assumptions. 

If the SASS and NAEP are to be conducted biennially, they should be 
scheduled in alternate years to spread data burden rather than 
concentrating the burden through merging the samples. 

^' Vl^^J^^T'^ '^'^ **°P^" * of providing riata for 

relational studies seems ill-advised. 

Although such studies may be informative and desirable, data bases of 
the magnitude of those generated from SASS and NAEP are not necessary 
for their conduct. The studies can be more effectively, and probably 

"ith smaller samples and stricter controls 
than those provided by NAEP and SASS. 

In addition to the issues of efficiency and effectiveness, the following 
l^^Ti *'Vf "i""* relative to relational studies: a) the strength of 
relationships between student achievement and other variables is not 
likely to change significantly in the short-term. Consequently, there 

i^a^lon!;M*"^K^^ 1° the 
relationships; b) the existence of large data bases linking teacher and 

school characteristics to student achievement may lead to inappropriate 

analyses and erroneous conclusions due to the temptation to apply a 

causal model to the interpretation of correlational stuies; c) 

relational studies are most appropriately conducted on a longitudinal 
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Dr. Leigh Burstem 
November 19, 1987 
Page Two 



The iwxt cycle of sass .h ,J necessary to inform policy, 

obtei"" fron the 198;.8« . T """"■I'" ""til the data to L 
influ^ce tSe Sesi^' available to 

"l°SfJ t «a"sf"ctor"''," "'V'l '° J"' °' 

impacting partic!J:("rU'tL ""nIe/is to g^eVt" 't'o^'ris^-je^f^^ 1"'^^ 
the assessments by mernlne the .»m„l.. ! 5 Jeopardizing 

burden. This is of specfal\o"ern « thL m . 

bein, proposed in the^N^P .VlVol^.Vsl'Al bnt:;r:o:irL::;r'" 

participants irthe Sltf ^.th " «<">««ed by burdening 

tangentLuy rS.t'ed to Nip^'tjo'^J^ri ' " 

discustL'°;le;!;!ng'trissrr"VhorVh"t'^h'° " 
POints.^_If any of 4 ™^id^°L^i^^•^ttn'/p^e^Ve"coX^J^^"s^r 

Sincerely, 

Paul D. Sandlfer, Director 
Office of Research 

/tnb 
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November 20, 1987 



Dr. Leigh Burstein 

Center for the Study of Evaluation 

UCLA Graduate School of Education 

145 Moore Hall 

4 05 Hilgard Avenue 

Los Angles, CA 90024-1522 

KE: NAEP/SASS Merges 

Dear Leigh: 

^h«n«^^i^''^^^ ^ convinced by the itianv and 

^^H^^n^ ^^^""^ P^P^" ^^^^ "»^^9er idea has almost no 
redeeming value. 

fh;,f T!?*^«.,?5P®?f ^° ^® possible reasons for the merger — 
ion?H L ? ""^^^V ^° P^°" research territory and that it 
would be t^H„''^f ^° ^^^'^ '"^^^ed surveys than it 

would be to do the two separately. Neither reason holds up. 

^' yaXu^ to regearrh ; The long and fruitless history of 

attempts to relate teacher and staff characteristics and 
beuavior as assessed by large scale survey instruments to 
cross sectionally gathered student outcome data should have 
convinced us long ago that it is only a mechanism for 
generating meaningless correlation coefficients. Our theory 
and our measurement sophistication are simply too weak to 
overcome the inherent difficulties in attempting to 
understand causal relationships with cross sectional survey 
N^Pwci^crV ^" the paper "Issues in the Combination of 
NAEP/SASS: Conceptual Issues" raises the proper issue in a 
carefully skeptical manner: "It will be necessary to 
determine the extent to which a cross-sectional data set 
would be an appropriate vehicle for investigating correlates 

? ^^^^ye^f^t ". The NELS88 and the earlier HSB 

longitudinal surveys are far better for such studies. 

The only research reason I can imagine for combining the 
surveys is to study the distributions of educational 

various sub-groups in the population — 

br^^!li«2i ^" ^^^^ Report. This might 

be accomplished more simply by augmenting NAEP with a fe-w 
carefully selected questions and perhaps with a schoo3 
representative survey of teachers. 
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ilH^r^^ i!!"^ V^"^ ^ ^^PPOse the cost might be 

Jh? J^t ^^"""^^ number of sampling problems set out 

throughout the issue papers, however, are sufficient to 
convince me that there is a substantial chance cf failure of 
the effort. There is a tremendous risk in putting all o? 
the eggs in one weak basket. In light of the apparently 
very sensitive nature of the naeP data collection the 
problem seems overwhelming. After all, at the present time 
we aie not sure even of our capacity to carry out NAEP 
without a hitch. Multiplied by 50 to obtain state 
representative samples for NAEP the proposal to combine the 
surveys seems like sheer folly, if Ee Sere to set up J 
tVnlt!^U t my prior is that the probability of a partial 
or complete breakdown of the combined survey would approach 
1.0 and there is a considerable chance that the breakdown 
could occur and not be identified for some time. 



Best wishes, 

Marshall s. Smith 
Dean 



Policy Studies Associates. Inc 
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December 3, 1987 



To: Leigh Burstein 

From: Brenda J. Tumbull ^ I 

Subject: Merger of NAEP and SASS 

JJe'^on^hf*''" ^" '^he meeting you chaired for CES on issues in 

imnoJr!n; V"*" S'"*'"? ""^^ » """^er of very 

Z r V P*'^"" '^hat will surely be of interest to CES'. 

Advisroy Council. In this memo. I would like to emphasize a procedural 

:u™e^J«uL:''^ "'"^^ '''' ^O""^^^ ^" conJinuL^ti 

I believe that the deliberations on merging NAEP and SaSS should begin with 
a systematic analysis oi the questions that CES would like to answer-in 

sit ^^e !;a5J%T''"'''*':: °' P^^" ^^'^ ^ hypothetical oata 

^nrln, ? invested in such a plan would greatly clarify both the 

nlr " ""^ addressing these questions with 

?!«Jr^ ^'"^ state-representative) data. I think the 

limits would come into sharp focus and would provide good reasons not to 

^?orJ «° ^o™"^ then the planning 

effort would have laid important groundwork for the eventual data analyses. 

r^rou^*",'"*/? analysis plan would, for example, force CES staff to think 
through a model of the determinants of student achievement. Such a model 
m!L °" decisions about data collection, as our meeting 

made clear. A model of learning as a long-term process leads to this 
scJ^irr** ^"^'"-""io^l data on the characteristics of teachers and 
schools can help in analyzing the factors that contribute to student 

Sltr^bLrrh ^'"r r"','':!' '^'^ "^he absence of 

, ? i . ""dents' beginning achievement levels, which were shaped by 

Jhc Ji^ii; ^f** ""'^ " previous schooling. By anticipating 

inaii^rj causal statements that could conceivably emerge from the 
analysis of a merged NAEP and SASS effort, I think CES would find these 
statements would be so hedged with caveats as to be fairly useless 

An analysis plan could do other things as well: 

o It could include many descriptive questions that a merged data set 
could answer. This would include questions about the types of 
students who receive instruction from teachers with particular 
backgrounds »nd qualifications. However, even in this area a good 
plan would consider the f.xtent to which this data set would 
capture the important variation within schools. 
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o To the extent that CES wants to investjc^te causal relationships 
between schooling variables and achievement variables, the 
analysis plan should identify any such relationships that are best 
addreso.' with a nationally representative or state- 
representative snapshot, as opposed to smaller-scale or 
longitudinal studies. 

o It would include consideration of how often the data need to be 
updated. As we discussed briefly at the meeting, trends of the 
sort that SASS will capture may not be so fast-changing that they 
require data points every two years, particularly with samples 
that are representative of every state. A three- or four-year 
cycle might be perfectly adequate. Collecting representative data 
for only a subset of states in each cycle might be another 
possibility, if the data could be weighted in such a way as to be 
nationally representative. 

Another immediate step for CES would b« to look at the data already in hand 
from teachers and administrators in the NAEP sample. What questions can 
these data answer? How do they need to be supplemented? Is a merger with 
5A55 a way of supplementing them, or would smaller, more focused studies do 
the job better? 

In summary, I think CES has be<tun to ask good questions about the wisdom of 
merging two large national efforts. Your summary of our meeting and the 
other written comments will give the Advisory Council i good set of 
arguments to ponder. My aim in this memo has been to suggest that good 
research management really has to work backwards-to begin with a set of 
questions one would like to answer, to construct analytic models that can 
answer the questions defensibly, and only then to plan the data collection 
that will fit the models. In this planning context, the considerable 
respondent burdens of national studies can be weighed and justified. 
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Merger of NAEP and School/Staffing Surveys 

The Center for Educati on Statistics is developing new data collection systens 
responsive to statistics needs of diverse users. Among other things, the 
Center is assessing the feasibility of a policy to begin combining, in 1990, 
the National Assessnent of Educational Progress (NAEP) with the new School 
And Staffing Survey (SASS), 

As the Center progresses through this exercise with NAEP and SASS, there are 
three goals it is trying to achieve: 

1 ) Collection and maintenance of a unified data set that could relate 
specific policies, mixes of resources, and changes in the instruc- 
tional system to outcooes; 

2) Lessening of burden on schools, school districts, and teachers; and 

3) Reduction of costs to the Federal Governnent for the collection of 
these data. 

While the goals appear valid and desirable on vheir face, they raise questions 
of ••why'*, "to what extent", and "how," Some questions, concerns and issues 
include the following: 

1 ) How can the Center deal with the conceptual distinction between 
surveys vith different purposes and divergent universes: (a) one 
sample of all schools with grades in range of K-12 for SASS and 

(b) three individual samples of U.S. schools for 4th and 8th grades 
and 12th grade for NAEP? 

2) Is the assumed reduction in data burden by combining the stirveys in 
1990 really a shift in burden (fewer schools but more burden in 
each school)? Will schools actually perceive a huge increase in 
burden when they are included in the sample? And, if so, would the 
quality of responses be affected for any of the parties (i.e., 
administrators, NAEP teachers, other teachers, students)? 

3) Following the data quality question, above, should participjiting 
schools be in rotating panels beginning in 1990 so studies of change 
can be enhanced or does the data burden issue demand that each data 
collection be from a fresh sample? 

A) Year 1990 is intenced to be a practical trial of a State repre- 
sentative NAEP (one course in one grade) together with merged data 
collection about schools and teachers considering that the remaining 
Schools and Staffing data will be collected bv a separate contractor, 
what technical and management questions should be addressed (e.g., 
common insrruraents processed independently/or by one contractor for 
inclusion into the data base)? 
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Will the integration necessitate design changes that wi sh-ift 
emphasis froo the primary goals of each of the individual surveys? 



6) Assuming a longer national survey and a shorter State 3urvey of 
teachers and students outcomes, vhat would be the consequences of 
examining relational questions at the national level vs, on a State- 
by-State level (assuming that the sample for the NAEP portion is a 
State sample in 1990 or 1992)? 

7) Should the cluster size in the teacher sample be increased to permit 
statements aboi** the set of teachers in a school? The issue is one 
of being able to represent the set of teachers as a characteristic 
of a school, rather than having only a small cluster of teachers 
that would allow statements about teachers in general with no link 
to specific schools. 

Given tha. NAEP sanples teachers of students to describe teaching 
methods and SASS samples teachers in schools to detemine teacher 
characteristics, can these two goals be achieved with a coonon 
sample? 

8) How can this merged system best be managed, given that it requires 
(a) test administration, (b) surveys to be completed by students, 
teachers and administratcrs , c) large scale data management and 

(d) both grantee managed KAEP and Federally managed SASS components. 
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Issues in the COTbination of NAEP/SASS 



Conc^rtual Issues 



This issue paper deals with fcxir sets of cx)nceptual issues related to 
the merger ard use of the data in a merged NAEP/SASS. 

1. Coinpatabilitv of Ctoiectives 

In 1978, in its continuing quest for cxirprehensive ard d^^erdable 
information on student achievement, in Section 405 (k) of the GEPA. 
Congress ipecifically directed NAEP to carry out certain assessr.ent 
activities: 

o collect and r^jort at least onoe every five years 
data assessing the performance of students at 
various age or grade levels in each of the areas of 
reading, writing and mathematics; 

o r^rt periodically data on changes in knowledge 
and skills of such students over a period of tire; 

o conduct special assessment of other educational 
areas as the need for additional national 
information arises; 

o provide technical assistance to State educational 
agencies and to loc2d educational agencies on the 
use of National Assessnent objectives, primarily 
pertaining to the basic skills of reading, 
mathematics, and ocnsnunication and on making 
ccnparisons of such assessment with the national 
profile and change data developed by National 
Assessment. 

Histori cally , NAEP has collected sane information on characteristics of 
respo ndents* ccnmiunities, including the region of the country in which the 
oanmunity is located, its size, an^ socioeoononic status. NAEP has in addition 
measured a few student backgrtxond variables, sucii as race and ethnicity, age, 

^ t ^rents* educational attainments. The objective of this collection' of 
backigiuund variables is to be able to translate them, together with the 
assessments, into meaningful guides to educational practitioners for the 
improvesnent of education. 

School and Staffing Survey has as its imrLsdiate <±)jective tt> create a 

caiprehensive data base that can be used to (1) profile the nation^s elementary 
and secondary teaching foroe; (2) enhance assessments of teacher supply and 
demand by teaching field, level and location; and (3) examine school policies 
and practices, administrator characteristics, aixi teacher workplace 
conditions. The ultimate objective to which the SASS data contribute, along 
with other data acquired in OS surveys, is the discovery of those conditions, 
methods and practices that seem to make for better ard nore effective teaching 
and learning in the nation^s schools and to make thac information available to 
those who make policies for, and those who opei^te, the educational enterprise. 
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TO achieve the <±)jectives to which the School and Staffing Siirvey 
contritutes, it is necessary to measure the effectiveness of teaching and 
learning in the nation's schools; i.e., to ai^sess educational progress, and to 
be ab] ■ relate differences in effectiveness to the varying characteristics 
of teat I- J, adnujiistrators, schools, and the oanmunity. lb have a s^jarate 
and even part iall y dtplicative, student evaluation as a part of the SASS is ' 
unaon ep t a hle fron the standpoints of cost to the government and burden on the 
schools, teachers and students. Therefore, OS nust explore the questions of 
links between NAEP and SASS, inclvding particularly costs and feasibility. 

Hcwever great may be the ccitpatibility of the cbjectives of M^EP and SASS, 
ti^e remain great difficulties in making the process and prraoedures equally 
ocrpatible, and there are seme vAio have grave reservatiais that making NAEP 
data useful for a greater range of purposes will undermine the assessment's 
cecity to perform its basic mission effectively. There is conoem that the 
dilution of rescxiroes and distortion of purposes can result fron extensive uoe 
of NAEP for district or sciiool Iwilding oonparisaB, or fron efforts to link 
NAEP to other assessments or data collection efforts. 

There are two very specific procedural considerations in ocmbining NAEP 
and SASS sanples. The NAEP sanple of schools is limited to schools containir^ 
4th, Sth, and 12th grades; the sairple of teachers is derived fron the sanple of 
students vdthin the schools. In contrast, the SASS school sanple ir.clxxies all 
schools, and the teacher sanple is a prcfcability sanple of teachers in all 
giBdes. Tb aooanraodatfi thpse differences while maximizing the utility of the 
data acquired, it will be necessary to analyze the costs, burdans and benefits 
of a variety of sanpling z^jproaches. 

Finally, there is the problem of linking one survey prroess that is 
deliberately insulated fron Federal operation so f-aat there vill be no Fedeial 
test of students or Federal evaluation of teaching nethods" and'another survey 
process that is operated directly, or through a contractor, by the Federal 
govemmenc. 



2. Potential A dded Value of a Meroer 

The potential analytic advantage of merging SASS and NAEP is that the 
resulting dataset woild contain more ccnprehensive information, and therefore 
vrould permit the investigation of more relational issues. There are two 
distinct ways in which this would cone about: by increasing the information 
base at a given organiaational level, and by permitting a new ccmbination of 
organizational levels to be studied. The organizational levels of interest 
here including the student level, the teacher level, the school level, and the 
district level. The relational issues are primarily those of studyirn 
correlates of student educaticral achievement. 

Increasing the information base at a given organizational level applies 
particularly to teacher infomation. NAEP currenUy permits student oStomes 
to be related to a small set of teacher variables, e.g., iwasures of special 
training. Merger with SASS would introduce additional teacher variabl^T suoi 



ERIC 



o Teaching Status 

o Teaching E^cperience 

o Teaching Load 7 . 



H-iese additioral variables could serve as potential predictors of student 
aitojmes, either singly or thrcugh developnent of multivariate models. 



The new ocnbination of organizational levels that would result frtan a 
SASS/NAEP merger is the student/district oanbijTation . The merged dataset would 
allcv study of district variables as predictors of student outccrnes. This is 
not currenUy possible, since NAEP does not collect data at the district xevel 
and SASS does not collect data on student outccres (except for overall 
graduation rates and oclTege application rates) . District char^cteriscics that 
oculd be related to stua^ait achievement include: 

o Teacher Pay Scales 

o Graduation Requirements 

o Hiring and Retirenient Policies 

Again, the additional ^rariables mic^t be of interest as individual predictors 
or as oonponents of injltivariate models. 

There are two basic questions that might be considered in this context. 
First, how valuable wcuid the additional analytic capabilities resulting frtm 
the merger be? Second, to the extent that they are valuable, is it better to 
merge the two surveys or to siirply augment NAEP to include nore potential 
predictors of student achievement? 

3. Types of Relationsh ips to be J ; pv«»g1 ;i^p^prf 

The preoeeding issue — the potential added value of a merger — is 
sonewhat abstract, in that it addresses the general value of relating student 
outcomes to variables measured at higher levels of agitiagation. It is also 
necessary to consider the potential utility of stxdyijig ^jecific relationships, 
and to decide whether oanbined SASS/NAEP data set is the best vehicle for this 
endeaver. Although this paper is not the appropriate place for setting out a 
list of specific relationships that night be stu-ied, it does seem valuable to 
oonjider a dichotonization of rel. "onships into those that are established and 
those that are hypothetical. 

An established relationship, e.g. , the effect of instructional time on 
achievement levels, cculd be addressed in two ways: It could be further 
confirmed, or it oculd be refined and stuiied in finer detail. Further 
confirmation wculd entail extending the results of case studies or of 
relatively limited surveys to a national population. Refinement would involve, 
for instance, establishing the differential effect of instructional tirc on 
different sub-pqpulations, e.g., on di^ erent ethnic groups or in different 
regions of the country. ^i-i-cx^ii. 

A case can be made, however, for not conducting this type of research 
The alternative would be to accept an established relationship as given ard to 
suiply measure the indicator, i.e., the correlate of achievement. If this 
approach wcsne taken, then the case for merging SASS ard NAEP would be less 
strong. 

Alternatively, the ocnibined dataset could be viewed more in terms of 
exploratory analysis, i.e., as a tool for formulating asd testing new 
relationships. Although new relationships do not necessarily ijiply new data 
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elements, they tend to do so, and the exploratory approach could well lead to 
lengthier survey instruments. Hiis could lead to valuable research results. 
Oil the other hand, it might be more effective to conduct this researr^i through 
special studies, rather than appending it to a major national survey. 

4. Utili-»?/ of Cross-Sectional Data 

Wien investigating the correlates of educational achievement, it must be 
recognized that aarrent achievement level is not sijiply a function of the 
current educational environment. It is, rather, a cumulative function of 
educational irpits that started in kindergarten or earlier. 

longitudinal studies, e.g., NELS, can measure educational irpjts over a 
period of years, and attenpt to develop models that predict or eiqjlain 
variation in educational attainment. Alternatively, studies that include 
pretests and posttests can measure changes in educational attainment, and 
relate these to current irputs. 

Both SASS and NAEP are cross-sectional studies, and will remain such, 
whether they are conbined or kept s^aarate. It will be necessary to determine 
the extent to which a cross-sectional dataset would be an e^nDpriate vehicle 
for investigating correlates of achievement, and to consider enhancements that 
might make the dataset more appropriate. 
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Linkage - Merger of NaEP/£a££ 
Item B. Merger Design Issues 



ISSUES IN THE COMBINATION OF NAEP AND SASS 
Global Sampling Issues 

0. The primary issue to be resolved is that the samples for NAEP 
and SASS were originally designed for two different purposes. 
Thi^re is some concern regarding combining the two surveys, 
since the final temple design for the combined surveys would 
necessarily be a compromise which may satisfy neither set of 
goals. Items !• - IV. below describe the essential 
differences between the two survey sample designs, and item V. 
describes what compromises look most appealing at this time. 

1. Question 1 : How can the Center deal with thF conceptual 
distinction between surveys with different purposes and 
divergent universes? 

Issue: NAEP and SASS currently use different sampling i.-ames. 
A sample design that would be used for both surveys must me^t. 
the needs of both surveys. Since the universes are different, 
this means that we would like to maximize the overlap between 
the two frames and samples, and use stratification to define 
relevant sets to use in estimation. 

NAEP studies three universes: 

1. the Bet of all schools wnich have a grade four; 

2. the set of all schools which have a grade eight; and 

3. the set of all schools which have a grade twelve. 

wnereas SASS studies one universe: the set of all schools 
irhich ;*c^e any grade in the range K-12 inclusive. The sample 
design must accomodate both (or all) universes to allow 
estimation for the entire U.S., while at the same time 
allowing the time series eatab lishea for J j A^^ r-o r^^^w^^.^-^^— - 
*?Ji.?j[ll^h£JiA3Je«^ "f^^ fall into av least 

one oi the three NAEP universes comprise 96. 2 percent of all 



I 



schools in the U.S. which have a grade in the range K-12. 



Question Numbers refer to numbers used on the document: Merger of 
NAEP and School/Staffing Surveys 
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Question 6: Assuming a longer national survey and a shortc-r 
State survey, vhat would be the consequences of exammg 
relational questions at the national level vs. on a State-by- 
State level? 

Issue: NAEP and SASS currently provide estimates at different 
levels of aggregation. The design of the sample for SASS 
allowed for State-by-State comparisons for schools and 
teachers, whereas the other comparisons to be made from the 
survey (e.g. public vs. private schools and teachers) ve. =^ 
only incorporated into the design assuming national level 
comparisons. The issue here is really the importance of the 
relational questions relative to other goals from NAEP and SASS. 

NAEP for 1990 is intended to be: 

1. Nationally representative for grades four and eight; 

2. State representative for the assessment of progress in 
mathematics for grade twelve; and 

3. Nationally representative for all other assessments in 
grade twelve. 

SASS for 1990 is designed to provide: 

1. National estimates for characteristics of schools 

2. National estimates for characteristics of teachers 

3. State comparisons for characteristics of schools 

4. State comparisons for characteristics of teachers 

5. National level comparisons between public and private 
schools 

6. National level comparisons between public and private 
school teachers 

7. National level comparisons between elementary and secondary 
schools 

a. National level comparisons between elementary and secondary 

school teachers 
9. National level comparisons between fields taught for 

secondary school teachers 

SASS can also provide national level comparisons; for example, 
it can be used to make comparisons of large vs. small schools 
or teachers, or for urban vs. rural schools or teachers, but 
the sample design for the 1988 survey did not explicitly 
account for these comparisons. 
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III. Question 5: W^ll the integration necessitate design 
changes that will shift emphasis from the primary goals of 
each of the individual surveys? 

Issue: NAEP and SASS currently provide estimates for different 
substantive populations. NAEP provides estlmat(?s of: 

1. the assessment of progress for students; 

2. the characteristics of teachers as they relate to 
progress. 

HAEP can provide estimates at the national level for school or 
school characteristics, and at some levels below national 
(e.g. regional, urban/rural), bu+ the sample is not well 
balanced across states for the school estimates. This is due 
in part to the emphasis on teachers and students, and in part 
becausr of the clustered nature of the sample, where counties 
are uB^d as the first stage units. 

SaSS provider estimates of: 

1. characteristics of teachers, and 

2. characteristics of schools and school districts. 

An integrated survey would attempt to optimize the sample so 
as to provide the best estimates for all of these goals, while 
at the same time considering some of the relational issues, 
j The last issue <II. above) focused on the relacive importance 
j of the level of aggregation. This issue is more concerned 

with the relative importance of the variables being studied at 
the same level of aggregation. 

IV. fiuefitlon 3; relieving the data quality question, st>Ove, should 
participating schools be in rotating panels beginning in 1990 
so studies of change can be enhanced or does the data burden 
issue demand that each data collection be from a fresh sample? 

Issue: NAEP and SASS are both recurring surveys, but neither 
of the current sample designs take account of the possible 
efficiencies of a rotation design. Both designs call for 
unduplication between the twc surveys and NELS:ee in 1988. 

The NAEP sample design selects counties or groups of counties 
' as the first stage of selection, with schools at the second 
I stage clustered within counties (initially thought to keep 

test costs down, though this point is under contention now). 
I SASS is designed selecting schools as PSU'e from a list, with 
an area £rame supplementation for private schools. If a 
rotating design were to be implemented, to ;neet the objectives 
of both surveys^ the rotation could occur at either of two 
levels: the county level and the school level. Determination 
cf the design for the combined surveys will be a function of 
the costs and size of the survey. 
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If the combined survey were a state sample for all grades, it 
may be that there would be enough schools in sample that"^tne 
cost savings realized from the elimination of between county 
travel would vanish. If the combined survey were only a 
national sample for parts of NaEP, it may be that savings 
would be substantial for NAEP to start with a sample of 
counties as PSU's. However, the advantages of a rotating 
design are reduced for NAEP because different topics are 
assessed each time; the recurrence of topics is staggered. 

V. Current proposals: some ideas on a combined design. 

The sampling frame for the combined surveys would be all 
schools with any grade in the K-12 range. Schools would be 
allocated to multivariate strata, with one cf the 
stratification variables being whether a school has a grade 4, 
e, or 12, soiTie combination of 4, 8, or 12, or none of these. 
Estimates would be wade for SASS from the entire sample. 
Estimates would be made for NAEP for grades 4, 8 and 12 using 
only the appropriate strata. 

Determination of the number of schools and teachers to be 
sampled will be a function of several factors: 

1- The cor of interview! g schools, teachers, and students. 
2. The type^ of analysis to be conducted using schools, 

teachers, and students, and the relative importance of 

each of these analyses. 

For 1990, the sample for the combined survey should be a 
national sample with state supplementation for the portions 
of SASS and NAEP that will require state estimates. 

The determination of whether a rotating design should be used 
will be a function of: 

1. Whether the analysis of data from the combined sample will 
have a component related to school context. 

2. Whether there may be a problem with burden if schools are 
sampled repeatedly. This may be a nonissue if large 
schools will fall into sample vith certainty for a state 
sample. 
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Merger Design Issues 



ISSUES IN THE COMBINATION OF NAEP AND SaSS 
Analytical Sampling Issues 



The most important issues for NAEP and SAGS related to the use 
of the data are the production and tiiaintenance of time series. 

o For NAEP, a time series using the universes of schools vith 
grades 4, 6, and 12 for national estimates of student ' 
performance in various subject areas are of paramount 
importance. 



o For SaSS, development of a time series nationall/ 

is as important as the primary cross-sectional goals : 
school and teacher estimates for the public/private sector, 
for the elementary/secondary sector, for states, and for 
secondary schorl teacher estimates by f.eld taught. 

These primary goals are 60'?»evhat in confl ict vith each other, 
and individually could lead to different sample designs. The ^ 
primary sampling question is hov the analysis plan can be 
used to determine hov to develop the sample design. 

1. Question 5: Will the integration necessitate design changes 

that vill shift emphasis from the primary goals of each of the 
individual surveys? and 

Question 7, part 2: Given that NAEP samples teachers of 
students to describe teaching methods and SASS samples teachers 
in schools to determine teacher characteristics, can these tvo 
goals be achieved vith a common sample? 

Tissue: What is th importance of the relational analysis 
' relative to the primary goals of the individual surveys? and, 

^ Issue: In establishing a model for the relational analysis, one 
must consider that there are different and varying influences 
to consider. For some portion of the sample, a class or 
aubsample of students will have only one teacher (e.g. fourth 
graders), and so some inferences can be made involving specific 
teachers tied to clusters of students. For another portion of 
the cample, a subsample of students in a specific grade vill 
have several teachers, and the degree of overlap betveen 
teachers and students in a school will be very fuzzy (e.g. 12th 
graders). Finally, some students in a school may have just 
transferred in, whereas other students may have gone through 
several grades in the same school where they are now sampled. 



♦Question Numbers refer to numbers used on the document: Merger of NAEP and 
School/Staffing Surveys 
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"what teachers to sample" is a function of what influences are 
to bfc considered in the model. How much can be considered 
realistically in a model? What is practical to collect? What 
does one do to reflect all of these influences? Or is the 
issue one of selecting only certaxn influences to be included 
in the model? 



The relational analysis can be conducted at several levels. In 
a model o± inputs related to outcomes, several sectors may be 
important f actcrs : 



1. Effects of specific current teachers; 

2. Effects of past teachers, represented as a set; 

3. Effects of a particular school in terms of environment ; 

4. Effects ol a particular school district in terirs of 
characteristics of the local population; 

5. Subject matters covered in the past and present; 

6. Instructional practices used in *he past and present; 

7. Other factors (e.g. demographics) which are needed as 
controls. 



It may al?o be important to consider characterisitcs of other 
students in classes in which the sampled students are located, 
as another set of environmental effects. A third class of 
factors might relate to parents and other non-schcol or 
non-teacher related items. 



Some sampling decisions are related to the planned analyses. 
Should a large sample of schools be taken vithin a county to 
provide estimates of school district environment. Should a 
small sample of schools be taken within a county to optimize 
the school estimates nationally and by state? Should a large* 
sample of teachers be taken within a school, both to relate 
.^•^ • /^teachers to students and also to provide a measure of school 

environment? Should a small sample of teachers be taken within 
a sc.iool to optimize the teacher estimates nationally and by state? 

There are also some issues of trying to oversample certain 
subpopulations to make comparisons. We can oversample schools 
in certain types of school districts or counties to ensure 
large minority representation for comparisons. We can 
oversample teachers by area taught or by characteristics 
identified In a screening Interview. We can oversample 
students, and ultimately parents, again after a screening 
interview, to represent minorities or other factors Important 
to a relational analysis. The decision regarding oversamplmg 
of minorities is entirely a function of where It Is most 
important to the analysis to have comparisons of minorities to 
the balance of the sample. If this Is only Important to the 
relational analysis, the oversampllng occurs at the last stage. 
Other comparisons may demand oversampllng of minorities at an 
earlier stage. 
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Agenda iter.: B 



Issues in the OCT±iination of naep and SASS 
Respondent Burden 

One of the stated objectives for corbining NAI=P ard SASS is to reduce 
the respondent burden. By using a cxntnon sanple of spools for both 
surv^, a significant reduction of burden at the school level may be 
adiie/ed. 

Ohe 1988 SASS will sanple appnadmately 13,000 public and private 
schools, 65,000 teachers, and 5,600 public sAool districts. If the 
reoQcanerdationsof the Alexander/James Stii3y gttsup are iaplaiEnted 
ttel^EP sanple vrould increase in size to include 700,000 students and 
afprxaojnately 14,000 schools and 60,000 teachers. A ccninon sanple' of 
schools caild reduce the nunber of schools by a factcr of 1.5 to 2. 
However, several additional points need to be cited. 

o Uie assumed reduction in fcwrden by ocmbiniirr the surveys may 
really constitute just a shift in burden fran nany schools with 
relatively light burden to fewer schools with substantially 
ancreased burden. If the schools perceive this as increased 
birtJen, wiM the quality of responses to lowered? 

o Reducing the number of schools in this way is unrelated to teacher 
burden. To the extent that the teacher sanples for the two 
surveys are non-overla^pircf, teacher burden will not be 
substantially reduced. 

o If a ocnbined teacher sanple is used, the burden on the 
andividuzd teacher who is re^xanding to both the NAEP 
and SASS data requests will increase significantly, perhaps 
by as auch as 50 percent. j 

° ^^sane level, burden nay beocne so large that we lose the 

oopperation of cur data providers. The 1987-88 SASS provides some 
insights into this potential problem, l) ihe sanple in sane small 
states exceeds 50 pei-oent of all schools. 2) In five large school 
districts, more than 50 schools have been selected, ani in New 
York City 190 were selected. The Center may also anticipate that 

choose not to participate in an expanded 
State-representative NAEP when the national stuient sanple reaches 
/ 00 f 000 • 

other approaches to oontrollirer burden indixJe: 

o Oantrol sanple selection at the school level to ensure that a 
school is only included every other survey cycle or every third 
cycle. 
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Incorporate matrix sanpling into the questionnaire design for 
SASS. However, this vould limit the usefulness of SASS for 
relational analysis, 

Reduce the questionnaire content and target questions at 
relatively narrow topics of interest. 
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Issjes in t he Oontpination of NAEP and SAS5 
Manaaemant 

I. NAEP and are operated under different management sUuctures. 

A. NAEP is operated as a grant with an external govemaivDe 
structure. 

1. Funding for NAEP ernes under a special NAEP 
line in the Education Departnent budget with 
authorization and appropriation set by congrBss. 

2. Grant awEunds are made through a ocnpetitive 
process, currently defined by general 
Dq>arbnent regulations and in the future 
by regulations specific to NAEP. 

3. Decisions about the design ard policies of 
NAEP are, by statute, made by the NAEP 
governing board, the Assessment Policy 
Ocxnmittee (APC) and its subocnmittees - 
e.g.. Learning Area Ocranittees, the 
Background Review Ooimittee, and the 
Technical Advisory Ocraaittee. Bie GEia 
Assistant Secretary is an ex officio nerriber 
of the APC. 

4. Project activities are carried out by the grantee, 
aarrenUy Educational Testing Service, and its sub- 
contractors - i.e., Westat has responsibility for 
sanpling and field operations. 

5. Analyses and reports (including publication approval • ' 
and dissemination) are done by the grantee and, to a 
lesser extent, secondary analysts. 

6. The grantee develops a clearance package which is 
reviewed by CES, FEEAC, and CMB. 

^- Addit ional NAEP related plannij^ activities are beira 
OQ«3ucted by a Consortium on Assessnent, organized by 
the Oouncil of Chief State School Officers and furx^^id bv 
ED ard NSF. ^ 

B. SASS is cpejated as an interagency transfer/agrBanent 
with the Bureau of the Census. 

1. Rinding for SASS is part of the general fureiing for 
CES in the ID budget. 
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2. Decisions about the design and policies of SASS 
are made by the Education Departnent. 
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5. 
6. 
7. 



Plami^ activities have been carried out by CES 
by Rand under contract to CES and by the Bureau ' 
oz the Census. 

^* has been done by CES and the Bureau of the 

Field c^jerations for 1988 will be conducted by the 
Bureau of the Census. ^ oyzne 

^J^iS^J^J^"^ ^5^8 SASS data will be done 
by CES and under ccnpetitive contract. 

RibUcation a^prcval and dissemination are detentuned 

n. NAEP and SASS have certain goals in cannon. 

A. I^ide timely, useful information to a variety 
of audiences. ~.*cuy 

B. Control operating costs. 

C. Control respondent burden. 

D. Be responsive to the interests arrl concerns of: 

o ^TJST^i'^ «^ levels, and ' .' 

Depart^oent, the U.S. CDn3r4ss, 

cwB, etc* 

A. School Questionnaire: ttsst of the items on the NMP s^hrr.^ 

(2) teacher suRjly andtomdl adnunistrator and i ^ 

B. Teacher (^jestionnaire: lher« is less overian <-k« ^ 
questionnaire Bni-h _Z7^ "veriap in the teacher 

1. NAEP's foais is on factors most related to assess- 

SSL^^S!!^' *^i^ly the classrton and instruc- 
tional practices in the subject matter area o?Se 
assesaent. (NAEP teacher sanple oansistTofeS 
student's current teacher iTtoe asses^t L^St. ) 

121 
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2. SASS provides more depth on teacher trainiJig, 
experience and attituies. (SASS teacher sanple 
consists of a randan sairple of all teachers in 
in the school.) 

C. Student Assessment Instruments (NAEP only) 

IV. Ooordination of NAEP, SASS, and NELS for 1988 

A. OS reviev^ed school and teacher questionnaLires frtm 
NAEP, SASS, and NEIS:88 and raocmnerxJed changes for 
iiicrBased cxansistency of related items. All parties 
agreed to make the changes: EIS, Mestat, Bureau of 
the Census, and NQRC. 



B. CES recaitnended a target of zero school overlap in 
sauples for the three surveys in 1988. All parties 
agreed *vid ooc^jerated. WestaVEIS drew tha NAEP sanple 
in June; NC3RC drew the NEIS:88 sanple in July based in 
part on Westat/EIS infonnation; a«i CES and the Bureau of 
the Census are drawii^ the SASS sanples based in part on 
NAEP and NELS infomation provided by NQRC. (SASS public 
school sanple was csrpleted in August but the SASS private 
school sanple has not yet been ccopletad.) 

C. CES sent two letters to Chief state School Officers, a 
June letter describiiTg our plans for coordinatiig the 
surveys and an August letter r^rting near zero public 
school overlap nationally and specific sanple infomation 
for the State. 



Coordination of NAEP aid SASS for 1990 
A. 



CES staff to develcp milestones for cxaordinated plannirw 
and uiplementation of 1990 surveys. 



B. CES staff to develop (1) analytic agenda and (2) msdel 
for integrated sanples for NAEP ard SASS with both input 
and review by outside people. 

C. NAEP items to be developed with NAET grantee and SASS 
Items to be developed by CES ana Bureau of the Census. 

1. NAEP items will be s ti OKjly influenced by 
■the Consortium of Assessnent and other 
scholarly/field input. 
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2. SASS items will be selected by ED in light • 
of ti-iiolarly; "!ield ijput. 

3. a? 11 be actively involved in the 
cc»^,-ination of these two develomental 
processes. 

a. CES to negotiate ccnnitaent to 
coordination with 30 aonth grantee 

b. CES to review all instnafints to pro- 
vide ooordinaticn across rr- wo^n itens 

c. CES to monitor instrunent developnent 
and convene meetings to iron out jaxblems 

4. CES work with field coordinated across NAEP and SASS 
projects. 

a. State coordinators in field collections 

b. fTTTg 

c. r ield inpjt for design and instrunentation 

d. State cooperative program 

P grantee cooperation and effort needed to collect SASS 
ans/instruments in overlappiivi schools. 

1. Design issues in averJappinj schools 

a. Sciiool Questionnaire: one instrunent 
or two? bridges (bench marks)? 

b. Teacher QuestionnaiiB: one instrument 

or two? bridges (bench marks)? 

c. Teacher samples witliin schools 



1. school contact 

2. possible overlap of NAEP and 

SASS teacher samples 



2. Data sharing agreements 



E. NAEP grantee and Bureau of the census cooperation and 




inplcment overlapping 
e.) 
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VI. Management structure issues 

A. C3angtessional action ccxald c±ange NAEP 'governance 
strxjctUTB — \3noB x r t ain> 

B. ajdget levels will affect sairple sizes, analysis efforts, 
trade-offs, etc. 

C. Field re^xanse could eif feet target sanples as well as 
respon se rates. 

D. Field advice: Dan't put two different data collections 
in an y school. If information is needed for two surveys 
frcm any single school, then be sure to fully integrate 
school contact aix3 data collection in that school. 
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CENTER FOR TMt STL'Dl OF E\ ALL ATIOS 
CENTER FOR RESEARCH ON f VALUATION 



STANDARDS AND STUDENT TESTKC 
UCLA GRADUATE SCHOOL OF LDUCaTIOn 



December lo, 1987 



405 hilcard a\ km^ 
LOS anCells California 90U24 15:1 

<2I3» 82S 4711 
(213) 20C lu: 



TO: Members, Advisory Council on Education Statistics (ACES) , CES 
Emerson J. Elliott, Director, CES 



R^:: Summary and Recommendations from Meetings on Merging NAEP and 
SASS, November 18 and 20, 1987 

The enclosed materials constitute my report to ACES from the 
November 18 and 20, 1987 meetings organized by the CRESST Quality 
Indicators Study Group on merging NAEP and SASS, The packet 
includes a summary and recommendations based on the discussions 
at the two meetings and on the written statements provided by 
meeting participants and other invited consultants. In 
addition, selected materials provided to participants prior to 
and during the meeting, lists of meeting participants, and the 
full texts of statements provided by participants and 
discussants. I apologize for the amount of .material; however, 
it IS my understanding that ACES members have had differential 
exposure to the questions and issues that led to merger 
discu;3sions. Therefore, I decided to be inclusive, thert-by 
allowing the audience the discretion in judging their information 
needs • 



To expedite consideration of the essential questions 
addressed by this activity and the recommendations it generated, 
a statement of the background of the meeting and discussion of 
the primary recommendations follows in this cover memorandum. 
The CRESST activity was in response to coniflicting advice 
received by the Director of CES. ACES had previousl;. recommended 
that a merger of NAEP and SASS proceed. This recommendation was 
in keeping with the recommendations on linking data collections 
from the report on alternatives for a national data system on 
eler^ntary and secondary education prepared by Hall, Jaeger, 
Kearney and Wiley (December 20, 193b). Yet other segments of the 
educational community questioned the advisability of the merger 
on a variety of technical, substantive, practical, and political 
grounds. 

The purpose of the meetings was to bring together persons 
knowledgeable about educational research, statistical, and policy 
analytic issues that CES's data collections (including NAEP, 
SASS, Longitudinal Studies) are intended to address to: 
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a. Consider the range of issues that CES had already 
identified and review its available docuinenta*:ion 
regarding these issues; 

b. Augment CES's prior analyses with other evidence that 
bears on the perceived benefits and costs of the 
proposed merger; 

c. Assess the likely consequences (e.g., for knowledge 
production, enhancing policy analysis capabilities, 
improving or degrading the quality of data ) of the 
merger ; 

d. Recommend options with regard to the decision process on 
the possible merger and the steps that should be 
undertaken in advance of a final determination to 
proceed with the merger. 

Participants were provided in advance specific questions and 
issues that the meeting was intended to consider and a set of 
pertinent documents. Two 5-1/2 hour meetings were scheduled 
with a day in between to accomodate the schedules of the desired 
participants and to allow time to prepare information from the 
first day's discussion assist the second day's deliberations. 

Without going into detail, despite the diversity of 
perspectives dnd interests represented in both days' meetings, 
there was consistency in the basic issues that needed to be 
addressed and considerable consensus about the primary 
recommendations. Briefly, the list of issues is as fellows: 

1. What does "merger" mean and how comprehensive (with 
respect to instrumentation and to samples) should it be? 

2. What analytical purposes should guide any merger 
decisions? 



3. What are the likely consequences of alternatives with 
respect to respondent burden and costs? 

4. How does the question of the desirable/necessary 
i cycle/periodicity and timing of SASS (or parts of SASS) 

interact with the above? 



, 5. What sets of analytical exercises/ special studies should 

be undertaken to address the merger issue in both the short run 
and the long run? 

The recommendations that achieved a general consensus from 
the meetings and written statements are: 



1. A majo r loerger of the questionnaires and samples from 
NAEP and SASS should NOT be attempted in 1990 , The risks of 
^ overburdening NAEP in 1990 are too great; Moreover, too little is 
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known about how SASS will actually function at this time to 
assess the benefits and consequences of strong ties with nAEP. 

2. whether NAEP and SASS should merge in 1992 or 1994 
warrants further study including analyses of existing dataTrom 
the two surveys gathered through the 1988 data collectTorTT 

3. Regardless of the extensiveness of the eventual merger 
|he analytical purposes that should guide merger efforts should 
be those dealing with informing the policy analytic process 
rather than enhancing capabilities to conduct school effects or 

effectiveness research in an integrated national or state- 

representative data base. An example of a policy analytic 
purposes that could be served through a "merger" effort are the 
gathering and maintenance of national (and perhaps state 
representative) indicator series dealing with questions of access 
and participation (e.g., which kinds of students receive 
instruction in which kinds of schools from which kinds of 
teachers?) 

4. For the snort term (e.g. . 1990) . a small set of teaching 
and schooling conoitions guestions selected from SASS~could be 
administered with NAEP to enhance its ability to ierve"p 5IIcv " 
analytic purposes. To this end analytical work using past NAEP 
collections of teacher and school characteristics as well as 
other efforts to identify specific policy analytic purposes to be 
served should be carried out in time to modify and augment the 
1990 NAEP school and teacher characteristics questionnaires. 

5. A three-year or even a four-year cycle for the major SASS 
data collection should be considered with at least part of the 
resource savings shift ' to conducting special studies (e7g77~ 
longer term study of flow of teachers into and out of the 
workforce for a panel of schools and disti'icts; augmentation of 
NAEP data collection in 1990 ; studies of the consequences of the 
intensity of respondent burden and costs consequences of major 
merger). Altematively, the SASS instrumentation can be broken 
up into smaller sets which could be fielded on different cycles 
with perhaps a core set maintained on a more frequent cycle. 
Spreading out the SASS cycle would also postpone collection 
activities in ways that would place less strain on plans for the 
1990 NAEP. 

6. Post poning major merger discussions beyond 1990 provides 
time and resources to consider ( through design and other special 
studies! the costs ajnd benefits of developing a merged sampling 
universe across the major data collections (including NELS as 
well as NAEP and SAS) . 

7. Attention is need ed to the benefits accrued at 
Che school level from participating in these surveys. 
"Contributing to national well-being" is increasingly losing out 
given the extensiveness of data collection demands and 
competition from data collection with greater extrinsic rewards. 
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The above conveys only the tenor of the discussions and 
written statements. Participants seemed genuinely concerned that 
the primary purposes of NAEP and SASS not be sacrificed or 
damaged by a hurried decision to merge the two. CES is 
undertaking major modifications end extensions of its 
data collection responsibilities over the next few years. Under 
such circumstances the participants seemed to feel that time 
devoted to fielding and reporting these collection efforts in an 
effective and credible manner is critical. Discussions of 
mergers of these data collections need to proceed at a more 
deliberative pace than at present. There is just too much at 
stake . 

I hope that you find the enclosed materials informative. I 
look forward to meeting with you to clarify and discuss any 
aspects of the meetings, documents, and issues. 
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